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1. Introduction 


The Graphics Library (GX) is a programmer’s interface to the Nintendo GameCube™ Graphics Processor 
(GP). We intend GX to be as thin as possible in order to achieve high performance, but it must also provide 
a logical and straightforward view of the hardware. Our design goal for GX is to provide a default configu- 
ration of the hardware so that, initially, the programmer can concentrate on the basics without being over- 
whelmed by unnecessary details. Later, as the hardware becomes more familiar, programmers can easily 
override the default configuration to expose more flexibility and features. 

1.1 Document organization 


This document serves as a starting point for graphics programmers to learn about Nintendo GameCube's 
graphics capabilities. Chapters are organized as follows: 


¢« Chapter 2 presents a simple code example. 

¢« Chapter 3 discusses system initialization and presents a Graphics Processor block diagram. 

* Chapters 4-12 document the GX functions to control the graphics pipeline (roughly in pipeline order). 
¢« Chapter 13 explains the CPU-to-graphics interface in more detail. 


« Chapter 14 discusses how to gather performance statistics, the CPU-to-EFB interface, and the GX 
verify system. 


* Chapter 15 explains the additional features of HW2 and how they are supported by GX. 
* Chapter 16 lists non-orthogonal features and warnings for developers. 
« Chapters 17-18 compare GX to two existing systems: Nintendo 64 (N64) and OpenGL. 


¢« Appendix A lists all the GX API function calls. (For detailed information about each function, refer to the 
GX pages in the online Dolphin Reference Manual.) 


« Appendix B lists the default state set by GxInit. 
« Appendix C provides more details on the display list format. 
¢« Appendix D outlines the texture format used by Nintendo GameCube. 


« Appendix E discusses in-memory data alignment and coherence issues. 


1.2 Syntax notes 


All of the Graphics library functions are prefixed by “GX,” Video Interface library functions by “VI,” and 
Matrix-Vector library functions by “MTX.” For details on these other libraries, refer to the corresponding 
sections in this guide. Functions prefixed with “OS” and “PAD” are described in “Operating System” and 
“Controller Library (PAD),” respectively, in the Nintendo GameCube Programmer's Guide. 
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2 Graphics Library (GX) 


While some VI and MTX functions are required to write a basic application, this overview is concerned pri- 
marily with the GX library. GX functions follow the naming conventions listed below. 


Note: Here, and in other places throughout the document, an asterisk (*) is used to indicate a wildcard. 


* GXSet* functions immediately set state in the graphics hardware. More accurately, they send state 
commands through a command FIFO which is then read by the Graphics Processor and routed to the 
proper register(s) (see "13 Graphics FIFO" on page 135). 


* GXInit* functions precompile state registers, which are stored in various types of objects (struc- 
tures). 


* GXLoad* functions set indexed state, usually from a precompiled object (structure). Indexed state 
includes light parameters, matrices, texture state, etc. 


* GXWrite* functions write data directly to CPU-accessible registers and thus set state asynchronously 
with the graphics command pipeline. 


* GxXSet* functions read state back from a shadow copy of the state kept by the GX API. 
* GXRead* functions read data directly from CPU-accessible registers. 


The GX API uses many enumerated types and several structure types. A structure type will be suffixed by 
Obj, Region, or List. GXColor is also a structure type. Any other type without these suffixes is an enu- 
merated type. 


1.3. A note on pointers 


The GX API sometimes expects pointers as arguments to its functions. The Nintendo GameCube operat- 
ing system (see “Operating System” in the Nintendo GameCube Programmer’s Guide) sets up the Gekko 
CPU to treat virtual addresses in a manner similar to a MIPS CPU. That is, the most significant bits (MSBs) 
of a virtual address indicate whether the target data is mapped to cached or uncached memory. The rest of 
the bits are the physical address of the data. The Nintendo GameCube Graphics Processor (GP) ignores 
these MSBs and therefore is only concerned with physical addresses. The application is not required to 
convert virtual addresses to physical addresses on behalf of either the Graphics Processor or the GX API. 


In general, the application will be working with cache-mapped data. If the application is accessing the 
same data as the Graphics Processor, the application must be careful to flush the data from the CPU 
cache before the Graphics Processor uses it. The Graphics Processor has no visibility to data in the CPU 
cache. For examples, see Appendix E. 


1.4 Useful books 


This document assumes that you know how to program in the C language, and that are you are familiar 
with 3D graphics programming and common 3D APIs such as OpenGL or Direct3D. If this is not the case, 
you might want to do some preparatory reading. Here are some useful sources (check your local book- 
store or library for the latest editions): 


Foley, James D., et al., Computer Graphics: Principles and Practice, 2nd Ed., Addison-Wesley, Reading, 
MA, 1990. 


Kempf, Renate and Chris Frazier (eds.), OpenGL Reference Manual, 2nd Ed., Addison-Wesley, Reading, 
MA, 1997. 


Woo, Mason, et al., OpenGL Programming Guide, 2nd Ed., Addison-Wesley, Reading, MA, 1997. 
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2 Code example: onetri.c 


Onetri.c is a fairly simple example that shows the basics of initializing graphics, making vertex formats, 
and drawing some flat-shaded primitives. All the data and code are included in a single file. This demo is 
also available in the source tree at /dolphin/build/demos/gxdemo/src/Simple/smp-onetri.c. 


Note: Despite the name, this demo draws more than one triangle. 
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Code 1 - onetri.c 


#include <demo.h> 


#define STRUT_LN 
#define STRUT_SD 
#define JOINT_SD 


Graphics Library (GX) 


ja hte hp pp oe ed th eh ewe heheh tp he Se ihe keeper he eh et ep. te te ete en to: Ol. * 
a a a a a a a a a a a a a a a el cer eee ere a */ 
130 // long side of strut 
4 // short side of strut 
10 // joint is a cube 
gi iy th i li. hk hn ns. i th, in hh in hin i i ti, am ui: te: ee, el, eam, Sey * 


The macro ATTRIBUTE ALIGN provides a convenient way to align initialized 


















































arrays. Alignment of vertex arrays to 32B IS NOT required, but may result 
in a slight performance improvement. 
SSS See Se eS ee eee eee ee a eee ee eee oe ee eee eee ee eee eee oe */ 
sl6 Verts_sl6[] ATTRIBUTE ALIGN(32) = 
{ 
// x y Zz 
-STRUT_SD, STRUT_SD, -STRUT_SD, // 0 
STRUT_SD, STRUT_SD, -STRUT_SD, // 1 
STRUT_SD, STRUT_SD, STRUT_SD, // 2 
-STRUT_SD, STRUT_SD, STRUT_SD, // 3 
STRUT_SD, -STRUT_SD, -STRUT_SD, // 4 
STRUT_SD, -STRUT_SD, STRUT_SD, eae) 
STRUT_SD, STRUT_LN, -STRUT_SD, // 6 
STRUT_SD, STRUT_LN, STRUT_SD, // 7 
-STRUT_SD, STRUT_LN, STRUT_SD, // 8 
-STRUT_SD, STRUT_SD, -STRUT_LN, L138 
STRUT_SD, STRUT_SD, -STRUT_LN, // 10 
STRUT_SD, -STRUT_SD, -STRUT_LN, // 121 
STRUT_LN, STRUT_SD, -STRUT_SD, // 12 
STRUT_LN, STRUT_SD, STRUT_SD, kee 13 
STRUT_LN, -STRUT_SD, STRUT_SD, // 14 
-JOINT_SD, JOINT SD, -JOINT_SD, // 15 
JOINT SD, JOINT SD, -JOINT_SD, // 16 
JOINT SD, JOINT SD, JOINT SD, // 17 
-JOINT_SD, JOINT SD, JOINT SD, // 18 
JOINT SD, -JOINT_SD, -JOINT_SD, // 19 
JOINT SD, -JOINT_SD, JOINT SD, // 20 
-JOINT_SD, -JOINT_SD, JOINT _SD // 21 
}; 
u8 Colors rgba8[] ATTRIBUTE ALIGN(32) = 
{ 
// Ly Gy Dy 2 
219 Zl y- 25, 255% // 0 
40, 40, 40, 255, // 1 
Sg 0 TSS p- 259 // 2 
}; 
/* ----- * 
Forward references 
aes ee See ee See eee eee eee ee ee eo ee ee ee oe ee oe oe eee eee ee, */ 
void main ( void ); 
static void CameraInit ( Mtx v ); 
static void DrawInit ( void ); 
static void DrawTick ( Mtx v ); 
static void AnimTick ( Mtx v ); 
static void PrintIntro ( void )¥ 
/* —-------- * 
Application main loop 
it Es eet ae a eh eed at ae es Se ee a ee a a ae a a Se ee ee See eee */ 
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Code exa 


void mai 
{ 
Mtx 
PADS 
pad[ 
DEMO 


Came 
Draw 


whil 
{ 


OSHa 


mople: onetri.c 5 


n ( void ) 


v; // view matrix 
tatus  pad[PAD MAX CONTROLLERS]; // Controller state 


0] .button = 0; 


Init (NULL); // Init os, pad, gx, vi 

raInit(v); // Initialize the camera. 

Init(); // Define my vertex formats and set array pointers. 
e(!pad[0].button) 

DEMOBeforeRender () ; 

DrawTick(v); // Draw the model. 

DEMODoneRender () ; 

AnimTick(v) ; // Update animation. 


PADRead (pad) ; 


lt ("End of demo"); 


/* a hh heheh Sel lel el, i tl, el le el ie el, elle ea ti eel ma ec tel i lel mae el heal i el el mae el tel es, le co Sl Se fom me Sei gmeSe * 
Functions 
PON cc ce ce mh cee ceed nr ct ce che cect ce ec ce nn ct ae a ee nn ce ce ce te ce nc ch ae ee ce nc ch ce te cee ne eee em utp ute ee */ 
/* Be et ye EF EEF AFAR AFC FRANCA FRA FAFA FEE EE PALE PAE AGRE HEE PEE PR SE PBS EP ee ES * 
Name CameralInit 
Description: Initialize the projection matrix and load into hardware. 
Initialize the view matrix. 
Arguments: Vv view matrix 
Returns: none 
IE ah Soe A Fo A FALE OE TY a is AGF pg iy PUB Oy Ty CE UA Oy i TE oe REE i ch Ti CEE EEL ERED: Eien ED ETD Bee */ 
static void CameraInit ( Mtx v ) 
{ 
Mtx44 ion // projection matrix 
Vec up = {0.20F, 0.97F, 0.0F}; 
Vec camLoc = {90.0F, 110.0F, 13.0F}; 
Vec ob jPt = {=LLOOF, =O VOR, =190,0F}; 
£32 left = 24.0F; 
£32 top = 32.0F; 
£32 near = 50.0F; 
£32 far = 2000.0F; 
MTXFrustum(p, left, -left, -top, top, near, far); 
GXSetProjection(p, GX_PERSPECTIVE) ; 
MTXLookAt(v, &camLoc, &up, &Obj}Pt); 
} 
/* RS a a et at se ee a Oe at oe OO aE a a a EN a a et ie Ee a eee Ee ae ee tie LER a igh fd SED pes tah pees aD peed hap bees Majeed GDI * 
Name DrawInit 
Description: Initializes the vertex attribute format 0, and sets 
the array pointers and strides for the indexed data. 
Arguments: none 
Returns: none 
YN se ae ec ee ce a ce ce a ce ee ec ect cel ec ac ec cen ae cee cn ce cee ee ce cen cel cen ae cee cen ne cea cnt ep ep ptm ep at ep Co */ 
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Graphics Library (GX) 


static void DrawInit( void ) 


{ 


GxColor black = {0, 0, 0, O}; 
GXSetCopyClear (black, GX_MAX 224); 


// Set current vertex descriptor to enable position and color0. 
// Both use 8b index to access their data arrays. 
GxClearVtxDesc(); 

GXSetVtxDesc (GX_VA_POS, GX_INDEX8); 

GXSetVtxDesc (GX_VA_CLRO, GX_INDEX8) ; 


// Position has 3 elements (x,y,z), each of type sl16, 
// no fractional bits (integers) 
GXSetVtxAttrFmt (GX _VTXFMTO, GX VA POS, GX POS XYZ, GX S16, 0); 





// Color 0 has 4 components (r, g, b, a), each component is 8b. 
GXSetVtxAttrFmt (GX _VTXFMTO, GX_VA_CLRO, GX_CLR_RGBA, GX_RGBA8, 0); 


// stride = 3 elements (x,y,z) each of type sl6 
GxXSetArray(GX_VA_POS, Verts_sl6, 3*sizeof(s16)); 
// stride = 4 elements (r,g,b,a) each of type us 
GxXSetArray(GX_VA_CLRO, Colors rgba8, 4*sizeof(u8)); 





// Initialize lighting, texgen, and tev parameters 

GXSetNumChans (1); // default, color = vertex color 

GXSetNumTexGens (0); // no texture in this demo 

GXSetTevOrder (GX_TEVSTAGEO, GX_TEXCOORD_ NULL, GX_TEXMAP NULL, GX_COLOROA0) ; 
GXSetTevOp (GX_TEVSTAGEO, GX_PASSCLR) ; 


/* at a a a ee at et ae at ee at et at at a tet at et aot et a a et ee iat eet st oath et ate th gt St tt tt et tt et et CD pd pl fa al igh il fh pel pipes ach * 
Name Vertex 
Description: Create my vertex format 
Arguments: v 8-bit position index 
fe} 8-bit color index 
Returns: none 
YN a Cee a a Cn cy Cen Co Co cn cng cg ft cee nl ec Cc nec esc ee ce cel me tt ep emg ot ep et I */ 
static inline void Vertex( u8 v, u8 c ) 
{ 
GXPosition1x8 (v); 
GXColor1x8(c); 
} 
/* a an ea cy Sane na cy a a ae a a a a a an a a UR a a a a a NN a a a a a, cg a a ay a, Pa ne ey ee ey SE pam ees pene ates pes * 
Name: DrawFsQuad 
Description: Draw a flat-shaded quad. 
Arguments: v0 8-bit position index 0 
vl 8-bit position index 1 
v2 8-bit position index 2 
v3 8-bit position index 3 
re) 8-bit color index 
Returns: none 
Ta a a gt ee et nen a eae ee Ea wae wa a oe eas */ 
static inline void DrawFsQuad ( 
u8 vO, 
u8 v1, 
u8 v2, 
u8 v3, 
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Code example: one 


we c).) 


Vertex 
Vertex 


Vertex 


(v0, 
(v1, 
(v2, 
Vertex (v3 


Description: 


Arguments: 
Returns: 


static void Draw 
{ 


tri.c 


DrawTick 


Draw the 


many times in the x, 


model once. Replicates a simple strut model 


y, z directions to create a dense 


3D grid. GXInit makes GX_PNMTXO the default matrix. 
Vv view matrix 
none 


Tick( Mtx v 


) 











£32 x; // Translation in x. 
£32 y; // Translation in y. 
£32 z; // Translation in z. 
Mtx m; // Model matrix. 
Mtx mv; // Modelview matrix. 
MTXIdentity (m) ; 
for(x = -10*STRUT_LN; x < 2*STRUT_LN; x += STRUT_LN) 
{ 
for(y = -10*STRUT_LN; y < STRUT_LN; y += STRUT_LN) 
{ 
for(z = STRUT_LN; z > -10*STRUT_LN; Zz -= STRUT_LN) 
{ 
MTXRowCol(m, 0, 3) = x; 
MTXRowCol(m, 1, 3) = y; 
MTXRowCol(m, 2, 3) = Z; 
MTXConcat(v, m, mv); 
GXLoadPosMtxImm(mv, GX_PNMTX0) ; 
GXBegin(GX_QUADS, GX_VTXFMTO, 36); //4 vtx/qd x 9 qd = 36 vtx 
DrawFsQuad(8, 7, 2, 3, 0); 
DrawFsQuad(1, 2, 7, 6, 1); 
DrawFsQuad(1l, 0, 9, 10, 2); 
DrawFsQuad(4, 1, 10, 11, 1); 
DrawFsQuad(1, 12, 13, 2, 2); 
DrawFsQuad(2, 13, 14, 5, 0); 
DrawFsQuad(18, 15, 16, 17, 2); 
DrawFsQuad(20, 17, 16, 19, 1); 
DrawFsQuad(20, 21, 18, 17, 0); 
GXEnd(); 
} 
} 
} 
} 
/* ik fl hl hhh k,n, fA, lh eS hl hh Ns, GN lh, hh, lh Jal, i ew gm Sue uum sen ps ees og * 
Name AnimTick 
Description: Moves viewpoint through the grid. Loops animation so 
that it appears viewpoint is continously moving forward. 
Arguments: Vv view matrix 
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static void AnimTick( Mtx v ) 
{ 
static u32 ticks = 0; // Counter. 


Mtx fwd; // Forward stepping translation matrix. 
Mtx back; // Loop back translation matrix. 

u32 animSteps = 100; 

£32 animLoopBack = (f£32)STRUT_LN; 


£32 animStepFwd = animLoopBack / animSteps; 


MTXTrans (fwd, 0, 0, animStepFwd) ; 
MTXTrans (back, 0, 0, -animLoopBack) ; 


MTXConcat(v, fwd, v); 
if((ticks % animSteps) == 0) 


MTXConcat(v, back, v); 


tickst+t+; 


The following library functions are explained in the following sections: 





* DEMOInit ("3.1 Video initialization" on page 9). 

* GXInit ("3.2 Graphics initialization" on page 9). 

* GXClearVtxDesc, GXSetVtxDesc ("4.1 Describing the vertex data” on page 12). 

* GXSetArray ("4.2 Describing arrays" on page 14 and "4.5.1 Indexed vertex data" on page 25). 
* GXSetVtxAttrFmt ("4.3 Describing attribute data formats” on page 15). 

* GXBegin/GXEnd ("4.4 Drawing graphics primitives" on page 17). 

* GXPosition, GXColor ("4.4.1 Primitive types" on page 17). 





* GXLoadPosMatrixImm ("5.1 Loading a modelview matrix" on page 34). 

* GXSetProjection ("5.2 Setting a projection matrix" on page 35). 

* GXSetNumChans ("6.1.3 Spotlights, directional lights and angle attenuation” on page 41). 
* GXSetNumTexGens ("7.1 Specifying texgens" on page 53). 

* GXSetTevOp ("9.3 Number of active TEV stages" on page 90) 


* GXSetTevOrder ("9.11 Texture pipeline configuration" on page 100). 
* GXSetCopyClear ("12.1.7 Clear color and Z for next frame" on page 127). 








All libraries available in Nintendo GameCube can be accessed using a single header file, <dolphin.h>. 
This header file is included here by way of the <demo.h> header file. 
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3. Initialization 


3.1 Video initialization 





In the preceding code segment, the DEMOInit function initializes the operating system, Controller, graph- 
ics, and video. The DEMOTnit function takes a parameter that is a pointer to the render mode to be used. 
If you pass in a NULL pointer, then a default mode is chosen based upon the video standard. 


For NTSC, the default render mode is 640x480, non-antialiased, double-buffered, deflickered, with a 
frame-rendering rate of 30Hz. On the 640x480 screen, some pixels are trimmed off to account for over- 
scan. Currently, DEMOITnit trims 16 pixels from the top and bottom of the screen, resulting in an actual 
size of 640x448. For more information about render modes, see "12 Video output" on page 125 or refer to 
the relevant pages in the Dolphin Reference Manual (HTML). 











The DEMO library encapsulates common functionality for the Nintendo GameCube demo programs. For 
more information on the DEMo library, refer to the Demonstration Library (DEMO) section in this guide. 





3.2 Graphics initialization 


The Graphics Processor (GP) comes out of reset with unknown register values. The GXInit function is 
used to set all the registers in the GP to default values. In the code example, GXInit is called by way of 
DEMOInit. 





GXInit also initializes a FIFO in DRAM that is used to send graphics commands and data from the CPU 
to the GP (for more information on FIFO, see "13 Graphics FIFO" on page 135). The FIFO write port is 
attached to the CPU and the FIFO read port is attached to the GP. This configuration is known as immedi- 
ate-mode, because graphics commands are sent immediately from the CPU to the GP as they are exe- 
cuted. In immediate-mode, the GP will interrupt the CPU when the FIFO is nearly filled, and the GX library 
will automatically suspend the application. 


Note: In multithreaded applications, you must designate a single thread as the “GX thread”. This is the 
thread that will be suspended when the FIFO is nearly filled, and should be the only thread gener- 
ating graphics commands. 


It is also possible to set the FIFO(s) up in a multi-buffered mode, in which the CPU writes commands to 
one FIFO while the GP reads commands from a different FIFO. See "13 Graphics FIFO" on page 135 for 
more information on the graphics FIFO. 
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3.3. Graphics Processor (GP) 
The subsequent chapters in this book roughly follow the order of the logical processing blocks of the GP: 
Figure 1 - Schematic of the GP 
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4 Vertex and primitive data 


The GX API supports indexed and direct vertex data with flexible sizes and types. Here is a list of the terms 
that we will use in the following discussion: 


Attribute: A component of a vertex; i.e., position, normal, color, texture coordinate, or matrix index. 
Each attribute consists of one or more members from an elementary type; for example, a position may 
consist of three floats (x, y, Z). 


Vertex: A group of attributes attached to a point in space; therefore, every vertex must have, at mini- 
mum, a position attribute. 


Primitive: A geometric object described by a group of vertices with the same format. 


Vertex Descriptor: Describes which attributes are present in a particular vertex format and how they 
are transmitted from the CPU to the GP (i.e., either direct or indexed). 


Vertex Attribute Format: Describes the format (type, size, format, fixed point scale, etc.) of each 
attribute in a particular vertex format. 


Vertex Format: A Vertex Attribute Format together with the Vertex Descriptor. 


Direct: When an attribute is GX_ DIRECT, the data representing that attribute is sent to the GP via the 
CPU/Graphics FIFO. 


Indexed: When an attribute is Gx_ INDEX*, an index to the attribute data is sent to the GP via the 
Graphics FIFO. The GP then fetches the actual attribute data automatically by using the index and an 
array pointer. 








To draw a primitive, you should follow these steps: 


1. 


OV RS ON 


Describe which attributes are present in the vertex format, and describe whether the attributes are 
indexed or referenced directly. 


For indexed data, set the array pointers and strides. 
Describe the number of elements in each attribute and their types. 
Describe the primitive type. 


Draw the primitive by sending the GP a stream of vertices that match the vertex description and 
attribute format. 
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Figure 2 - Vertex and attribute description 


EE ~~ All vertices ina primitive have the same Vertex 
= aa ri Descriptor and ‘Yertex Attribute Format 
é 


~ 
~~ 






W~—_———— _ 









Vertex Attribute F onnat 0: 
Position: x,y,z; 16 fixed point, unsigned 14.2 format 
Normal: nx,ny nz; Sb fixed point, signed 1.6 format 
Color: RGBS8S8S format 

TexCoord0: st; floating point format 





Vertex Descriptor: 
Position: 8b Index “ 
Nonmal: & Index 
Color: Direct 

Text oord0: 16b Index 








4.1. Describing the vertex data 


Code 2 - Vertex descriptor 


GxXClearVtxDesc(); 

GXSetVtxDesc(GX_VA_POS, GX_INDEX8) 
GXSetVtxDesc (GX_VA_NRM, GX_INDEX8) 
GXSetVtxDesc (GX_VA_CLRO , GX_DIRECT) 
GXSetVtxDesc (GX_VA_TEXO, GX_INDEX16 


a 


a 


\e 


The GXSetVtxDesc function is used to indicate whether an attribute is present in the vertex data, and 
whether it is indexed or direct. There is only one active vertex descriptor, known as the current vertex 
descriptor. The Gk<ClearVtxDesc command is used to set the value GX_NONE for all the attributes in the 
current vertex descriptor. GX_ NONE indicates that no data for this attribute will be present in the vertex. 
Once cleared, you only need to describe attributes that you intend to provide. The possible attributes are: 








* Position, GX _VA_POS (this attribute is required for every vertex descriptor). 
* Normal, GX_VA_NRM, or normal/binormal/tangent, GX_VA_NBT. 


* Color_0, GX_VA_CLRO. 





* Color_1,GX_VA_CLR1. 





¢ Up to 8 texture coordinates, GX _VA_TEX0-7. 
* A position/normal matrix index, GX_VA_PNMTXIDx. 


* A texture matrix index, GX_VA_TEXOMTXIDX - GX_VA_TEX7MTXIDX. 








These last two attributes are 8-bit indices which can reference a transformation matrix in the on-chip matrix 
memory. This supports simple skinning of a character (for more on skinning, see “Advanced Rendering” in 
this guide). These indices are different from the other attributes in that they may be sent only as direct 
data. 
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Describing the vertex data 


The GP assumes that you will send any specified attribute data in the ascending order shown in the table 


below: 


Note: Texture coordinates must be enabled sequentially, starting at GX_VA_T! 
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Table 1 - Vertex attribute order requirements 


Order Attribute 















































0 GX_VA_PNMTXIDX 

1 GX_VA_TEXOMTXIDX 
5 GX_VA_TEX1MTXIDX 
3 GX_VA_TEX2MTXIDX 
4 GX_VA_TEX3MTXIDX 
5 GX_VA_TEX4MTXIDX 
6 GX_VA_TEX5MTXIDX 
7 GX_VA_TEX6MTXIDX 
8 GX_VA_TEX7MTXIDX 
9 GX_VA_POS 

10 GX_VA_NRM or GX_VA_NBT 
11 GX_VA_CLRO 

12 GX_VA_CLR1 

13 GX_VA_TEXO 

14 GX_VA_TEX1 

15 GX_VA_TEX2 

16 GX_VA_TEX3 

17 GX_VA_TEX4 

18 GX_VA_TEX5 

19 GX_VA_TEX6 

20 GX_VA_TEX7 











EXO. 
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4.2. Describing arrays 


The attributes of a vertex may be indexed or direct (with the exception of Gk_VA_PNMTXIDxX, and 
GX_VA_TEXOMTXIDX-GX_VA_TEX7MTXIDx, which are always direct). For an indexed attribute 
(GX_INDEX8 or GX_INDEX16), you need only to send an index to the attribute data. The GP will use the 
following equation to compute the address of the data: 














Equation 1 - Attribute address 


AttrAddress = AttrBase + Index* AttrStride 

The GX_INDEX8 index type allows a maximum array size of 255 elements (0-254). The index 255 is 
reserved, and indicates that this vertex should be skipped in the command stream. See "15 Multi-resolu- 
tion geometry" on page 73 in the Advanced Rendering section for applications of this feature. 





The GX_INDEX16 index type allows a maximum array size of 65,535 elements (0-65,534). The index Oxffff 
(65,535) is used to indicate that this vertex should be skipped. 


The attribute base pointer (byte-aligned) and stride (in bytes) are set using the GXSetArray function 
(described further in "4.5.1 Indexed vertex data" on page 25). The hardware will read the data described by 
the vertex attribute format (see "4.3 Describing attribute data formats" on page 15) from the array. This 
avoids the need to read the data into the CPU only to copy it back into the graphics FIFO. However, index- 
ing vertex data has cache coherency issues; see Appendix E. 





The Graphics Processor has its own vertex data cache in order to make the fetching of indexed data more 
efficient. The vertex cache is an 8K, 8-way set-associative cache. Notice that each attribute can be stored 
as a separate array. There is no need to pack a vertex structure in memory, because the current vertex 
descriptor and vertex attribute format allow the assembly of vertex data from the various arrays at run time. 
You can invalidate the vertex cache using GXInvalidateVtxCache. This will force the cache to reload 
vertex data. 


A directly referenced attribute (GX DIRECT) will have its data copied directly into the Graphics FIFO. Direct 
data can be used when the data is already available in the CPU cache, or when you are generating data 
algorithmically on the fly. 
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Describing attribute data formats 


4.3. Describing attribute data formats 


The Vertex Attribute Format Table (VAT) allows you to specify the format of each attribute for up to eight 
different vertex formats. The VAT is organized as shown: 


Figure 3 - Vertex Attribute Format Table (VAT) 
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rr a | 
You can store eight predefined vertex formats in the table. For each attribute in a vertex, you can specify 


the following: 


¢ The number of elements for the attribute. 


¢ The format and size information. 


« The number of fractional bits for fixed-point formats using the scale parameter. (The scale parameter is 


not relevant to color or floating-point data.) 
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Code 3 - GXSetVtxAttrFmt 


a7, format index attribute n elements format n frac bits 
GXSetVtxAttrFmt (GX_VTXFMTO , GX_VA_POS, GX_POS_ XYZ, GX_S 8, 0); 
GXSetVtxAttrFmt (GX_VTXFMTO " GX_VA_CLRO , GX_CLR_RGBA, GX_RGBA8 7 -0)3 


The code above defines vertex attribute format zero. GX_VTXFMTO indicates that position is a 3-element 
coordinate (x, y, z) where each element is an 8-bit 2’s complement signed number. The scale value indi- 
cates the number of fractional bits for a fixed-point number, so zero indicates that the data has no fractional 
bits. The GX_VA_CLRO attribute has four elements (r, g, b, a) where each element is 8 bits. 


Notes: 
« The matrix index format is not specified in the table because it is always an unsigned 8-bit value. 


« The scale value is implied for normals (scale = 6 or scale = 14) and not needed for colors. Also, 
normals are assumed to have three elements (Nx, Ny, Nz) for GX _VA_NRM, and nine elements 
(Nx, Ny, Nz, Bx, By, Bz, Tx, Ty, Tz) for GX_VA_NBT. Normals are always signed values. 


. On HW1, the scale value is fixed at zero for 8-bit formats. HW2 does not have this limitation. 


* The normal format (GX VA _NRM) is also used for binormals/tangents (GX_VA_NBT) when they are 
enabled in the current vertex descriptor. 


The VAT in the Graphics Processor has room for eight vertex formats. The idea is to describe most of your 
attribute quantization formats early in the application, loading this table as required. Then you provide an 
index into this table (which specifies the vertex attribute data format) when you start drawing a group of 
primitives using GXBegin. If you require more than eight vertex formats, you must manage the VAT table 
in your application, reloading new vertex formats as needed. 
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4.4 Drawing graphics primitives 


4.4.1 Primitive types 
The following figure illustrates the types of primitives supported: 


Figure 4 - Graphics primitives 
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GX_ POINTS draws a point at each of the rn vertices. Points are described further in "4.4.2 Points and lines" 
on page 18. 





GX_LINES draws a series of unconnected line segments. Segments are drawn between v0 and v7, v2 and 
v3, etc. The number of vertices drawn should be a multiple of 2. Lines are described further in "4.4.2 Points 
and lines" on page 18. 





GX LINESTRIP draws a series of connected lines, from v0 to v7, then from v7 to v2, and so on. If n verti- 
ces are drawn, n-7 lines are drawn. 


GX_ TRIANGLES draws a series of triangles (three-sided polygons) using vertices v0, v7, v2, then v3, v4, 
v5, and so on. The number of vertices drawn should be a multiple of 3, and the minimum number is 3. 








GX_TRIANGLSTRIP draws a series of triangles (three-sided polygons) using vertices v0, v7, v2, then v7, 
v3, v2 (note the order), then v2, v3, v4, and so on. The number of vertices must be at least 3. 





GX_TRIANGLEFAN draws a series of triangles (three-sided polygons) using vertices v0, v7, v2, then vO, v2, 
v3, and so on. The number of vertices must be at least 3. 











GX_QUADS draws a series of non-planar quadrilaterals (4-sided polygons) beginning with vO, v7, v2, v3, 
then v4, v5, v6, v7, and so on. The quad is actually drawn using two triangles, so the four vertices are not 
required to be coplanar. 


Note: The diagonal common edge between the two triangles of a quad is oriented as shown in "Figure 4 
- Graphics primitives" on page 17. The minimum number of vertices is 4. 
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4.4.2 Points and lines 


Points are described by a single vertex, either 2D or 3D, and may be textured or not. You define a point’s 
size using: 


Code 4 - GXSetPointSize 


GXSetPointSize(u8 size, GXTexOffset tex_offset); 


Points are drawn as a square in screen space, centered about the location of the vertex. The size of a 
point may be specified in 1/6 pixel units and the maximum size is 42.5 pixels. If the point is textured, a tex- 
ture coordinate should be generated or supplied per point. This texture coordinate is attached to the top- 
left corner of the point. The other texture coordinates for the other corners of the point are generated using 
tex_offset. 


Note: tex_offset is specified in normalized texture coordinates. 


Figure 5 - Point definition 


tex_offset 





Q 
q : (XY,Z) 
n 


tex_offset 











vy 
Lines are described by two vertices, either 2D or 3D, and may be textured or not. You define a line’s width 
using: 


Code 5 - GXSetLineSize 


GXSetLineSize(u8 width, GXTexOffset tex_offsets) ; 


Lines are centered about the location of the vertices. The small edges of a line will be drawn horizontally or 
vertically, depending upon the slope of the line (see "Figure 6 - Line definition" on page 19). 


Note: Line area is not preserved under rotation. 
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The width of a line is specified in units of 1/6 pixel, and the maximum width is 42.5 pixels. If you are looking 
down the line from the start point to the end point, the starting texture coordinate is attached to the near 
left-hand corner of the line, while the ending texture coordinate is attached to the far left-hand corner. The 
texture coordinates for the right-hand corners are produced by adding the tex_offset value to the corre- 
sponding left-hand corner texture coordinates. The tex_offset is only added to the s component. See the 
figure below for details. 


Figure 6 - Line definition 


width/2 GX_LINESTRIP 


(sO+A, tO) (sO, tO) 









Lines closer to vertical have 


(x0, yO, 20) edges aligned to X axis 


Lines closer to horizontal have 
edges aligned to Y axis 


(sO+A, t0) (s0,t0) (s1,t1)  (s1+A, t1) 


(s0, t0) = t1) 
(sO+A, tO) Scene t1) 
(s1+A, t1) t0) 
(x1, yl, z1) 
(s1, t1) oa t0) 


(s1+A,t1) (st, t1) (s0,t0) — (sO+A, tO) 





(s1+A, t1) > (s1, t1) 
tex_offset 





Note: Wide line strips created using GX_LINESTRIP may have overlapped joints that may show gaps 
and cracks. 
4.4.3 Rasterization rules 


As shown in "Figure 4 - Graphics primitives" on page 17, polygons whose vertices appear in clockwise 
order are defined to be frontfacing. 
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Polygon edges that fall exactly across a sample center will include the sample if the sample lies on a left 
edge of a polygon; samples that fall exactly on right edges will not be included. When an edge is horizon- 
tal, samples that fall exactly on upper edges are included, while samples that fall exactly on lower edges 
are not. A sample that occurs at the intersection of two edges will be included only if both edges should 
include the sample. These rules are illustrated in the following diagram. 


Figure 7 - Polygon rasterization rules 


O O O O a O O O O 


Samples are indicated by 
circles. 


Each triangle includes only 
the similarly-shaded 
samples that it overlaps. 





Points are converted into a quad (two triangles) by extending them by half the point size horizontally and 
vertically about the center. Once this is completed, the rules for polygon edges are applied to determine 
which samples the point includes. Lines are converted into quads in a similar way, and again, the rules for 
polygon edges are applied to determine which samples the line will include. 


4.4.4 Using vertex functions 
The following functions can specify vertex data and indices to vertex data: 


Code 6 - Vertex functions 


GXPosition[n] [t] 

n: {1, 2, 3}, t: {s8, u8, sl6, ul6, £32, x8, x16} 
GxXNormal [n] [t] 

n: {1, 3}, t: {s8, sl6, £32, x8, x16} 
GxColor[n] [t] 

hi {1, 3) 4h, ee {us wl6, w32,; «8, x16} 
GXTexCoord|[n] [t] 

n: {1, 2}, t: {s8, u8, si6, ul6, £32; x8; x16} 
GXMatrixIndex1u8 
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You draw primitives by calling vertex functions (GXPosition, GXColor, etc.) between GXBegin/GXEnd 
pairs. You must call a vertex function for each attribute you enable using GXSetVtxDesc (), and for each 
vertex in the order specified in "Table 1 - Vertex attribute order requirements" on page 13. Each vertex 
function has a suffix of the form Gx [data] [n] [t], where data describes an attribute (e.g., position, color, 
etc.), and n describes the number and f¢ the type of elements passed to the vertex function. See the GX 
pages in the online Dolphin Reference Manual for details on each particular vertex function. 


Code 7 - Drawing primitives using vertex functions 
GXBegin(GX_TRIANGLES, GX_VTXFMTO, 3); 


GXPosition1x8(0); // index to position 
GXColor1x16(0); // index to color 


GXPosition1x8 (1); 
GXColor1x16(1); 


GXPosition1x8 (2); 
GXColor1x16 (2); 


GXEnd(); 


GXBegin specifies the type of primitive, an index into the VAT, and the number of vertices between the 
GXBegin/GxXEnd pair. This information, along with the latest call to GxSet Vt xDesc () , fully describes the 
primitive, vertex, and attribute format. GxEnd() is anull macro in the non-debug version of the library. In 
the debug version, it makes sure that GXBegin and GXEnd are paired properly. You may call vertex func- 
tions between GXBegin and GXEnd only. 














The data type for each attribute should correspond to the vertex attribute format selected. In the example 
below, the data is indexed, so you call vertex functions that describe the format of the index (the ‘x8’ or 
‘x16’ type is used to indicate indices): 


Code 8 - Using vertex functions 
GxXClearVtxDesc (); 

GXSetVtxDesc (GX_VA_CLRO, GX_INDEX8) ; 
GXSetVtxDesc (GX_VA_POS, GX_INDEX16) ; 


GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_ POS, GX _POS XYZ, GX_U8, 0); 
GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_CLRO, GX_CLR_RGBA, GX_RGBA8, 0); 


bf. iss 
GXBegin(GX_VTXFMTO, GX_TRIANGLES, 3); 


GXPosition1x8 (0); // index to position 
GXColor1x16(0x3e4); // index to color 


GXPosition1x8 (1); 
GXColor1x16 (0x123e) ; 


GXPosition1x8 (2); 
GXColor1x16(17); 


GXEnd(); 


The function GXPosition1x8 indicates that this function takes one unsigned char (8-bit) index as a 
parameter. The function GXColor1x16 indicates that this function takes one unsigned short (16-bit) index 
as a parameter. 
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4.5 Vertex data organization 


Indexed attribute data supports Nintendo GameCube goal of flexible data organization. The programmer 
can organize data used by animation, collision, and graphics with a minimum of reformatting. 


When using indexed attribute data, the attribute values themselves are stored in main memory in an array. 
The programmer describes graphics primitives using indices to reference into one or more of these arrays. 
The graphics hardware computes the physical addresses from the indices and fetches the data. 


A vertex cache is used to cache parts of the arrays as they are accessed, taking advantage of the natural 
locality in geometric data. The data is stored in the vertex cache in quantized format, which improves effec- 
tive memory bandwidth. 


Figure 8 - Flow of indexed vertex data 
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The following example of a simple textured box shows how indexed data can be more compressed than 
direct data. 


Note: This example only computes memory size. Indexing can also be beneficial in terms of bandwidth, 
because data already in the vertex cache will not need to be read from DRAM. 
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Code 9 - Indexed vs. direct compression example 








// 
// Indexed version of textured cube 
// 
// position data 
£32 MyPos[] = 
{ 
00.0, 00.0, 00.0, 
00.0, 00.0, -100.0, 
00.0, -100.0, 00.0, 
00.0, -100.0, -100.0, 
=100..05 00.0, 00.0, 
-100.0, 00.0, -100.0, 
-100.0, -100.0, 00.0, 
-100.0, -100.0, -100.0 
}; 
// texture data 





ul6 MyTex[] = 
{ 
0x0000, 0x0000, 
0x0000, Ox0f00, 
0x0£00, 0x0000, 
Ox0f00, Ox0f00 
}; 
// 
// draw 6 sides of cube, 2 x 8-bit index per vert 
// 6 sides x 4 verts x 1B/indx x 2 indx/vert = 48B 
// 8 pos x 3 £32 x 4B/f£32 = 96B 
// 4 texcoord x 2 ul6/texcoord x 2B/ul6 = 16B 


// total = 48B + 96B + 16B = 160B 


// 
// Direct version of textured cube 
es 


// draw 6 sides of cube, 4 vertex each 
// 1 pos x 3 £32/pos x 4B/£32 + 1 texcoord x 2 ul6/texcoord x 2B/ul6 = 16B/vtx 


// total = 24 vtx x 16B/vtx = 384B 


The indexed attribute arrays can be compressed, removing duplicate attribute data. Also, you can order 
the data for animation or collision processing so that it will be loaded efficiently into the CPU cache. Keep 
in mind that the graphics chip is not cache-coherent with the CPU. In other words, before the GP accesses 
the data, you must explicitly flush the CPU cache of any attribute array data it has accessed previously 
(see OS function DCStoreRange). Also, you must invalidate the vertex cache using GXInvalidateVtx- 
Cache if you relocate or modify an array of vertex data that is read by, or may be cached by, the vertex 
cache. 
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Nintendo GameCube also supports direct data, which is copied directly into the graphics FIFO (see "13_ 
Graphics FIFO" on page 135). The hardware gets the data from the FIFO and sends it down the pipeline; it 
does not go through the vertex cache. 





Figure 9 - Flow of direct vertex data 
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In addition, it is possible to mix indexed and direct attribute data within a vertex. 


Finally, Nintendo GameCube can generate new vertex data based on the existing vertex data in hardware. 
For example, you can generate texture coordinates from position. This can be considered another form of 
data compression. See "7 Texture coordinate generation” on page 53 for more information. 
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4.5.1 Indexed vertex data 


In addition to specifying the vertex attribute descriptor (using GXSetVtxDesc) to indicate that an attribute 
is indexed, you must also specify a pointer to the array of attribute data, and the stride in bytes between 
successive elements in the array. 


There is a unique base pointer and stride for every attribute type. The pointer and stride are set using the 
GXSetArray function. You can also use GXSetArray for setting pointers and strides for indexed light 
arrays and matrix arrays. 


Figure 10 - Indexed vertex data 










































































GXSetVtxDesc(GX_VA_POS, GX_INDEX8) 
DRAM as GXSetVtxAttrFmt(GX_VTXFMT0, GX_VA_POS. ...) 
x. GXSetArray(GX_VA_POS, ...) 
Position Array 
| - Base Pointer 
- Stride 
Position Array 
GXSetVtxDesc(GX_VA_TEX0, GX_INDEX8) 
GXSetVtxAttrFmt(GX_VTXFMT0, GX_VA_TEX0,...) 
Tex Coord Array GXSetArray(GX_VA_TEXO, ...) 
- Base Pointer 
- Stride 
GXBegin(GX_TRIANGLES., ...) a 
GxPosition1x8(12) ; 1 
GXColor1x8(1) 
GXTexCoord1x8(5) 
pave Tex Coord Array 
. GXSetVtxDesc(GX_VA_CLRO, GX_INDEX8) 
GXSetVtxAttrFmt(GX_VTXFMTO, GX_VA_CLRO, ...) 
2 GXSetArray(GX_VA_CLRO, ...) 
7 Color Array 
GXEnd 
no - Base Pointer 
- Stride 
Color Array 
Code 10 - GXSetArray 
s8 Verts8[18] = { -100, 100, O, 
100, 100, O, 
=100;. =100, 0 49 
u32 Colors[3] = { 
OxFF0O00000, 
OxO0OFFOOO0O, 
OxOOOOFFOO }; 
GxXSetArray(GX_VA_POS, (u32)Verts8, 3); 
GxXSetArray(GX_VA_CLRO, (u32)Colors, sizeof(u32)); 
flrs 
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The stride parameter also allows you to step through structures: 


Code 11 - Arrays of vertex structures 


typedef struct { 

uS Xy;yVrZe 

ul6 s,t; 

u32 rgba; 

u32 private data; 
} MyVert; 


MyVert myverts[100]; 


IL sx 

GXSetArray(GX_VA_POS, &myverts[0].x, sizeof (MyVert) ); 
GXSetArray(GX_VA_CLRO, &myverts[0].rgba, sizeof (MyVert)); 
GXSetArray (GX_VA_TEX0, &myverts[0].s, sizeof (MyVert)); 


GXSetVtxDesc (GX_VA_POS, GX_INDEX8) ; 
GxXSetVtxDesc (GX_VA_CLRO, GX_INDEX8) ; 
GXSetVtxDesc (GX_VA_TEX0, GX_INDEX8) 


GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_POS, GX POS XYZ, GX_U8, 2); 
GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_CLRO, GX_CLR_RGBA, GX_RGBA8, 0); 
GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_TEX0, GX_TEX ST, GX_S16, 4); 





4.5.2 Direct vertex data 


Graphics Library (GX) 


When the vertex descriptor for an attribute is set to GX_DIRECT, the vertex function will copy the vertex 


data into the graphics FIFO (see "13 Graphics FIFO" on page 135). 


Note: This is different from indexed primitives, in which the vertex function copies an index to the data 
into the graphics FIFO, and the data is read directly from main memory by the Graphics Processor. 


Direct data is coherent with the CPU cache, since it is copied through it. 
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Direct data is also useful in cases where the data you want to send is already in the cache. For example, 
matrices are likely to be in cache because they are needed for animation and collision. Therefore, it may 
be more efficient to write them to the graphics FIFO directly, rather than index them from an array. On the 
other hand, direct data uses more bandwidth because each element must be read into the CPU cache, 
written to the graphics FIFO, and then read out of the FIFO again into the Graphics Processor. With 
indexed data, you write the index into the FIFO and only read the data if there is a miss in the vertex cache. 


Code 12 - Direct vertex data 


GxXClearVtxDesc(); 
GxXSetVtxDesc (GX_VA_ POS, GX_DIRECT) ; 
GXSetVtxDesc (GX_VA_CLRO, GX_DIRECT) ; 


GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_CLRO, GX_CLR_RGB, GX_RGB8, 


GXSetVtxAttrFmt (GX_VTXFMT0, GX_VA_POS, GX_POS XYZ, 


GxXBegin(GX_TRIANGLES, GX_VTXFMTO, Bye 
// vert 0 
GXPosition3f£32( 100.0, 100.0, 0.0 ); 
GXColor3u8( Oxff, 0x00, Ox00 ); 
// vert 1 
GXPosition3f32(0.0, 0.0, 0.0); 
GxXColor3u8( 0x00, Oxff, Ox00 ); 
// vert 2 
GxPosition3£32(100.0, 0.0, 0.0); 
GXColor3u8( 0x00, 0x00, Oxff); 
GXEnd(); 


4.5.3 Mixture of direct and indexed data 


Indexed data may be mixed with direct data in a vertex format. Sometimes, the data size may be small and 
since each vertex is unique, it is not worth indexing. This data can be sent directly while using indexing for 


other attributes: 


Code 13 - Mixture of direct and indexed data 


GxXClearVtxDesc(); 

GXSetVtxDesc (GX_VA_POS, GX_INDEX8) ; 

GXSetVtxDesc (GX_VA_CLRO, GX_DIRECT) ; 
GxXSetArray(GX_VA_POS, &MyPos[0], sizeof(f32)*3); 


GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_CLRO, GX_CLR_RGB, 
GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_POS, GX _POS XYZ, 


GxXBegin(GX_TRIANGLES, GX_VTXFMTO, 3); 


// vert 0 

GXPosition1lx8( 1 ); // this is an index, use 
GXColorlul6( Oxf551 ); // this is color data, not an index 
// vert 1 


GXPosition1x8( 0 ); 

GxXColorlul6é ( 0x3243 ); 

// vert 2 

GXPosition1x8( 2 ); 

GXColorlul6é ( 0x1897 ); 
GXEnd(); 
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4.6 Display lists 


A display list is a pre-compiled list of primitive-rendering or state-setting commands. Once the list has been 
created, the GP can access it directly and process the commands as many times as needed. This provides 
tremendous savings in memory bandwidth, compared to having the CPU constantly create and send 
immediate-mode primitives. To get optimum performance from the system, therefore, requires the use of 
display lists. 


Figure 11 - Display list flow 
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4.6.1 Creating display lists 
Display lists may be created in various ways: 


« By using GXBeginDisplayList, GX primitive commands, and GXEndDisplayList. 





* By loading a GPL file created from the Character Pipeline (C3) library. The C3 library is designed to 
make creating and animating characters easier. Logically, C3 is a layer above the GX library (i.e., it 
only calls GX functions; it does not replace GX functionality). 


* By creating an array containing display list command tokens and data. 


4.6.1.1 Using GXBeginDisplayList and GXEndDisplayList 


With this method, you first allocate space in memory in which to store the display list. The starting address 
of the display list must be a multiple of 32 bytes. The OS function OSAl1loc is useful for this purpose, since 
it provides 32-byte-aligned memory pools. Before writing display list commands to this memory area, you 
must make sure that it is forced out of the CPU data cache. This is because the CPU’s write-gather buffer 
(used to write graphics commands to memory) is not cache coherent. The OS command 
DCInvalidateRange may be called to ensure the memory range is not in the cache. 
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Once the memory area has been set up, you can then call: 
Code 14 - GXBeginDisplayList 


void GxXBeginDisplayList ( 
void *list, 
u32 size); 


The /ist argument is the starting address for where the display list will be stored. The size argument indi- 
cates the number of bytes available in the allocated space for writing display list commands; it allows the 
system to check for overflow. Also, the size must be a multiple of 32 bytes. 


Note: Due to padding issues, the size of the memory area may need to be as many as 63 bytes larger 
than the actual number of bytes needed for commands and data. This is explained below. 


Once GXBeginDisplaylList has been called, all further GX commands are written to the display list 
instead of to the normal command FIFO. The GXEndDisplayList command signals the end of the dis- 
play list, and it returns the command stream to the FIFO to which it had been directed previously. The 
GXEndDisplayList command also returns the actual size of the created display list in bytes. The size 
returned will always be a multiple of 32 bytes (i.e., the system will pad the end of the display list will null 
commands as necessary). 








Note: The display list is padded by flushing the write-gather buffer. Flushing is accomplished (internally 
in GX) by writing 32 null commands to the write-gather buffer. As a result, up to 32 bytes may be 
written at the end of the display list. In addition, any of the left-over bytes from a flush (up to 31) 
may be written to the start of a display list. (We conserve CPU cycles by not resetting the write- 
gather buffer to eliminate the left-over bytes.) This is why the size of the allocated space should be 
at least 63 bytes larger than requirements otherwise anticipate. 


Because the write-gather buffer is used to put GX commands into memory, the CPU data cache does not 
need to be flushed after the display list has been written in this manner. However, if the CPU copies a dis- 
play list from one memory area to another, then the data cache may have to be flushed, depending upon 
how the copy is performed. 


Display lists cannot be nested. This means that once a GXBeginDisplayList has been issued, it is ille- 
gal to issue another GXBeginDisplayList or GXCallDisplayList untila GXEndDisplayList com- 
mand comes along. 





4.6.1.2 Loading GPL files 


You can use the Character Pipeline’s C3 tools to create display lists. These display lists are an integral part 
of the GPL files output by the tools. You can then use the geoPalette runtime calls to load a GPL file, fol- 
lowed by geoPalette’s Display Object (DO) runtime calls to render the display lists. For more information, 
refer to “Game Engine Programming” in the Nintendo GameCube Character Pipeline & CG Tools Guide. 
Here are some of the relevant calls: 


Code 15 - geoPalette/display object calls 
Void GetGeoPalette ( GeoPalettePtr *pal, char *name ) 

void GetDisplayObject ( DODisplayObjPtr *dispObj, GeoPalettePtr pal, ul6 id, char *name ) 
void DOShow ( DODisplayObjPtr dispObj ) 

void DOSetWorldMatrix ( DODisplayObjPtr dispObj, Mtx m 

void DORender ( DODisplayObjPtr dispObj, Mtx camera, u8 numLights, ... ) 
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4.6.1.3. Creating arrays containing display list commands 


Appendix B describes some of the command tokens that go into a display list. You can put such tokens 
and the associated data directly into an array. When creating an array whose elements are defined at com- 
pile time, use the ATTRIBUTE ALIGN (32) pragma (specific to the Metrowerks CodeWarrior compiler) to 
guarantee proper alignment. You must also be sure to pad the length of the array to a multiple of 32 bytes 
using null commands, as in the following example: 





Code 16 - Sample array containing display list 


u8 OneTriDL[] ATTRIBUTE ALIGN(32) = 
{ 


(GX_DRAW_QUADS | GX_VTXFMTO), // command, primitive type | vat idx 
0: 36, // number of verts, 16b 
8, Oy Tp Oy. Be Oy Bp Oy // quad 0 

Le Le 2; 1, Ty My 6, 1; // quad 1 

i; 25 Oy. is OF 2p OS. i // quad 2 

4, Ay 1, 1; 10; 17 V1, dy // quad 3 

Ly 2, 12, 2, 13, 2, 2, 2, // quad 4 

2, OQ; 23, O, 24. 0, Sy. Oy // quad 5 

18; 2). 15; 2) he, Bye U7 2; // quad 6 

20, 1, 17, 1, 16, 1, 19, 1, // quad 7 

203. Oy 21, 03. “8s Oy ae Ol // quad 8 


GX_NOP, GX NOP, GX NOP, GX NOP, GX NOP, GX NOP, GX_NOP, // pad 
GX_NOP, GX NOP, GX NOP, GX NOP, GX NOP, GX NOP, GX_NOP, // pad 
GX_NOP, GX_NOP, GX_NOP, GX_NOP, GX NOP, GX_NOP, GX NOP // pad to 32B 




















Arrays created in this manner are loaded into memory via the disc system, and thus should be guaranteed 
to be resident in memory and not in any CPU cache, so no special cache flushing is required. 


In addition, display list arrays can be created dynamically. You can use the OSA11oc function to acquire a 
pool of memory that is guaranteed to be 32-byte aligned, then stuff the array with the desired command 
tokens and data. (Remember, of course, to pad the end of the command stream to a 32-byte boundary 
using null commands.) Finally, the array must be flushed from the CPU’s data cache before the Graphics 
Processor can call it. You can use the function DCStoreRange (or DCFlushRange) to do this. 


4.6.2 Drawing primitives using display lists 
Once a display list has been created, you can call it using: 
Code 17 - GXCallDisplayList 


void GxXCallDisplayList ( 
void *list, 
u32 size); 


This call takes the size of the display list to increase the efficiency of display list processing and obviate the 
need for an explicit return command. 


As shown in "Figure 11 - Display list flow" on page 28, the Graphics Processor has its own logic to handle 
GXCallDisplayList commands. CPU involvement is not necessary to change over the FIFO source, 
since the GP handles this (and, in fact, has separate internal FIFOs for mainstream graphics commands 
vs. display list commands). This allows the GP to handle display lists very efficiently. 
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4.6.3 Effect on machine state 


A call to GXCallDisplayList does not perform any state pushing and popping. The only effect is to tem- 
porarily change the source of graphics commands from the original source to the display list. Therefore, 
any state that is changed during a display list call remains changed after the display list has been pro- 
cessed. 


Display lists that include state-changing commands have further complications. Certain state registers in 
the GP include more than one piece of state. However, when writing to such a register, all of the included 
state is affected. You cannot write to only certain bits of a register without writing to all the other bits. 


The GX API maintains shadow copies of the registers that are updated as the CPU processes GX com- 
mands. However, when the GP processes display lists, any state-changing commands in the display list 
will update the actual registers and not update the shadow copies of the registers. Consequently, the state- 
changing commands that occur in a display list (or after a display list) may unexpectedly affect other pieces 
of state as well. Also, calls to inquire about current state (which read the shadow registers) may return 
incorrect results. 


As a result, one should be very careful about placing state-changing commands within display lists. You 
may want to separate state from geometry within a display list, or else limit the state-changing commands 
to ones that relate to the contained geometry. Geometry-only display lists can thus be used without worry- 
ing about side effects, and the user need only pay special attention when using state-changing display 
lists. 


There is one further complication. Not all of the GX commands act immediately (“act” means inserting 
commands into the current FIFO or command buffer). Instead, some set variables within GX, and the 
actual relevant GP commands are not sent out until a GXBegin command is seen. This is known as “lazy 
evaluation,” and the purpose is to avoid sending unnecessary commands which could slow the system 
down. This “lazy state” is also flushed by GxBeginDisplayList before the display list is started, by 
GXEndDisplayList before the display list is finished, and by GXCallDisplaylList before the display 
list is called. Currently, the lazy state includes the VCD/VAT registers, the texture-coordinate scale regis- 
ters, and the GEN_MODE register. The latter contains bits indicating the number of active TEV and indirect 
stages, the number of active textures, the number of rasterized colors, back/front culling mode, and anti- 
aliasing mode (these features are described in later chapters). 

















Appendix C includes more information on display-list format, and it details which pieces of state are tied 
together within the hardware registers. 


4.7 GXDraw functions 

A number of basic 3D objects can be drawn using functions provided by GX. The following are provided: 
* Cylinder. 

* Torus. 

* Sphere (iterated). 

* Sphere (recursive). 

* Cube. 

* Dodecahedron. 

* Octahedron. 

* Icosahedron. 


« Normal table. 
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The following features are common: 

¢ All shapes fit tightly within X = +/- 1, Y=+/-1,Z=+/- 1. 

* The shapes are typically symmetric around the Z axis. 

* All functions save and restore the VCD and VAT/VTXFMT3. 
¢« Positions and normals are provided for all shapes. 


Texture coordinates may be provided for the torus, iterated sphere, and cube. They will be sent if the VCD 
had texture coordinates enabled prior to the function being called. Similarly, NBT normals may be provided 
for the cube. 


These are the functions themselves: 
Code 18 - GX Draw functions 


void GXDrawCylinder(u8 numEdges) ; 

void GXDrawTorus(f32 rc, u8 numc, u8 numt); 
void GXDrawSphere(u8 numMajor, u8 numMinor); 
void GXDrawCube (void); 

void GXDrawDodeca (void) ; 

void GXDrawOctahedron( void ); 

void GXDrawIcosahedron( void ); 

void GXDrawSpherel( u8 depth ); 

u32 GXGenNormalTable( u8 depth, £32* table ); 





GXDrawTorus takes arguments specifying the radius of the cross-section (i.e., the “fatness,” with 0 < re < 
1), the number of subdivisions around the cross-section, and the number of subdivisions around the over- 
all torus. GXxDrawSphere is the iterated sphere, and it takes arguments specifying the number of lateral 
subdivisions and the number of longitudinal subdivisions. GXDrawSpherel is the recursive sphere; it is 
generated by recursively subdividing an icosahedron to the specified depth. GXGenNormalTable allows a 
normal table to be generated by recursive subdivision of an icosahedron. You specify the recursion depth 
and a pointer to memory to store the table. The function returns the total number of normals generated. 
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In this chapter, we describe the transformation section of the Graphics Processor. 


33 


The GP has an internal matrix memory. The programmer can load several matrices into the memory and 
specify one of them as the current matrix using GXSetCurrentMtx. Another way to specify a matrix is to 
provide a per-vertex matrix index. Vertices that do not specify a modelview matrix index will use the current 
matrix. When a matrix index is specified, the index used becomes the current matrix index; i.e., it over- 
writes the index set by GXSetCurrentMtx. 


The same matrix memory index which specifies the modelview matrix that transforms the vertex position 
also specifies the normal matrix when lighting is enabled. 


Note: The picture below is somewhat simplified. Refer to "5.6 How to override the default matrix memory 
configuration" on page 39 for more details on the matrix configuration. 


Figure 12 - Modelview and projection data path 
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5.1. Loading a modelview matrix 
The functions shown below are used to load a position matrix and to specify which matrix should be used: 
Code 19 - GXLoadPosMtxImm 


GXLoadPosMtxImm(&v, GX_PNMTX0O) ; 
GXSetCurrentMtx (GX_PNMTXO) ; 


Assuming an implicit W of 1.0, the basic vertex position transform from object or model space to homoge- 
neous eye space is: 


Equation 2 - Vertex position transform 


X 
a, 
Y, |= [Transformifatrix(n)(3x4) |x 
Z 
‘ 1.0 


The matrix memory is configured by default to contain: 


* 10 position and normal matrix pairs (GX_PNMTX0-9), described by the enumeration GXPosNrmMtx. 





* 10 texture matrices (GX_TEXMTX0-9), described by the enumeration GXTexMtx. 





* An identity matrix (GX_IDENTITY), also described by the enumeration GXTexMtx. 
All matrices use floating point data. 


The normal matrices are used for vertex lighting (see "6 Vertex lighting" on page 41). The texture matrices 
are used for various texture coordinate operations ("7 Texture coordinate generation" on page 53). In the 
example onetri.c, GXInit sets the default matrix to GX_PNMTXO. 


The normal transform is similar to the position transform; however, translation is neither required nor use- 
ful. Consequently, the matrix does not need to convert a homogeneous normal. The normal is not trans- 
formed by the projection and screen space conversions. The function GXLoadNrmMtxImm loads and 
converts a 3x4 matrix into a 3x3 matrix in normal matrix memory. This assumes that the normal matrix is 
usually the inverse transpose of the modelview matrix, which is usually a 3x4 matrix. The functions 
GXLoadNrmMtxImm3x3 and GXLoadNrmMtxIndx3x3 may be used to load a normal matrix directly from 
a 3x3 matrix in main memory. 


Note: There is no function to do an indexed load of a 3x3 matrix from an array of 3x4 matrices. 
The transformed normal is re-normalized before being used in the lighting calculations. 


Equation 3 - Vertex normal transform 


Mw N, 
Ny. |= [NormalMatrix (2 (3x 3)] x| MV, 
N N 


ze 
A matrix can be loaded either by copying it directly into the Graphics FIFO (GXLoadPosMtxImm) or by 
indexing a matrix array in DRAM (GXLoadPosMtxIndx). Indexed matrices are loaded directly by the 
graphics hardware; i.e., the matrix data is never sent through the Graphics FIFO. 


= 


Note: Indexed matrix loads have cache coherency issues; see Appendix E. 
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When using index matrices, you must first set up a base pointer and stride for the array you wish to access 
using the GXSetArray function. The base pointer is the address of the first element in the array (index = 
0). The stride is the number of bytes between successive elements of interest in an array. For example, an 
array of 3x4 floating point matrices would have a stride of 48B = 3 rows x 4 columns x 4B/float. The stride 
also allows the programmer to index arrays of structures that have other data, as well as matrices, in them. 
The maximum stride accepted by GXSetArray is 255B. Use the attributes Gk VA POS MTX ARRAY, 

GX VA _NRM MTX ARRAY, and GX_VA_ TEX MTX ARRAY to specify a matrix array. 














Note: Matrix memory is not a matrix stack. It is assumed that the application will manage matrix stacks in 
main memory and concatenate matrices using the CPU. The loaded matrix should transform the 
object to be drawn from local model space to view space. The MTX library supports creating and 
manipulating matrix stacks. 


It is the application's responsibility to manage the matrix memory; the GX API simply provides a mecha- 
nism for loading and using matrices. 


A utility library is provided for matrix and vector math functions; see “Matrix-Vector Library (MTX).” 


5.2 Setting a projection matrix 


The GP performs standard projection transforms for each vertex position. The projection is done sepa- 
rately from, and after, the modelview transform in the GP. The GXSetProjection function will load a sin- 
gle projection matrix. 


Code 20 - GXSetProjection 


GXSetProjection(p, GX PERSPECTIVE) ; 


The first argument is a pointer to a 4x4 projection matrix (see “Matrix-Vector Library (MTX)” for more infor- 
mation on matrix format and construction). The second argument indicates whether the matrix is perspec- 
tive or orthographic. The projection transform hardware assumes the following form for the projection 
matrix: 


Equation 4 - Perspective projection 


X.]) [po o pt o)fy, 
fe |_|) 0 .pe po OY 
z.| |[o o p4 psilz, 
Ww.) }o o -1 offi 


Equation 5 - Orthographic projection 


A, po 60 

_| 0 pe 
z.| |o 0 Ps : 
Ww.) |o 0 


The Matrix-Vector library contains functions to help set up perspective and orthographic projection matri- 
ces conveniently. They include MTXFrustum, MTXPerspective, and MTXOrtho. 


os 


The resultant homogeneous coordinates are in clip space. Once the clip space coordinates are obtained, 
1.0/Wc is computed for converting to non-homogenous coordinates. Once this is done, the (x, y, z) coordi- 
nates will lie within the normalized space of ( [-1...+1], [-1...+1], [-1...0] ). 
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5.3 Culling, clipping, and scissoring 
Figure 13 - Clipping and culling data path 
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The coordinates resulting from the projection transform are said to be in clip space [Xc, Yc, Zc, Wc]. The 
hardware will conditionally reject triangles that are frontfacing, backfacing, or both front- and backfacing as 
set by the GXSetCul1Mode function. As stated earlier, frontfacing triangles are those whose vertices 
appear onscreen in clockwise order. 


Guardband clipping to +/-2Wc reduces the amount of clipping in the transform unit. The rasterization pro- 
cess uses evaluation to compute only pixels that are within the visible screen, so there is no fill rate penalty 
for this type of clipping. 


Triangles are clipped when they are both outside the guardband region and inside the viewport. Other tri- 
angles will be either totally out of the viewport (trivially rejected), totally inside the viewport (trivially 
accepted), or partially inside the viewport (trivially accepted but scissored). 
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Points are trivially accepted when they are within +/-2Wc and are trivially rejected when they are outside +/ 
-2Wc. This prevents wide points from being trivially rejected when the vertex is outside of +-Wce. In this 
case, due to the point’s width, some of it may be visible. Lines are also only trivially rejected when they are 
outside +/-2Wc. 


Figure 14 - Clip coordinates 


-2Wc ; 2Wc 


























-—— Guardband 







Clip Wide point 


Trivial Accept 





Wide point 


Trivial Accept Trivial Accept 









Viewport 






Wide point 


Trivial Reject 
Trivial Reject 




















After clipping, the vertex (x, y, Z) is perspective-divided. The viewport transform converts the coordinates 
into screen space. 


Note: Actual clipping is a very slow procedure that creates stalls in the transform engine, thus it should 
be avoided whenever possible. Clipping can be disabled by the function GxSetC1ipMode. There 
is a guardband at the far clipping plane that makes it acceptable to turn off clipping for most far- 
clipped objects. However, disabling clipping for near-clipped objects results in incorrectly drawn 
polygons. 


GXInit enables backface culling and sets the viewport and scissor box to full screen size. 


5.4 Viewport and scissoring 
The following equation performs the conversion from clip space to screen space and perspective scaling: 


Equation 6 - Clip space to screen space conversion 


x, 10 Pa Seem X apret 
i, =—_* eS sate + Yappset 
Z, : Ly "2 scale 2 oppset 


The resultant screen space coordinate is then sent to the setup unit for rasterization. The 1/Wc value is 
also sent to the setup unit for texture space computations. 
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The viewport is set using the following function: 
Code 21 - GXSetViewport 


GXSetViewport ( 
£32 xOrig, 
£32 yOrig, 
£32 width, 
£32 height, 
£32 nearZ, 
£32 farZ ); 


The screen space origin (0, 0) is at the top-left corner of the display. The screen space scale and offset are 
computed using floating-point values. This is used to advantage in the function GXSetViewportJitter 
to jitter the viewport by half a line in field rendering modes. 


GXSetScissor sets the scissor box, typically to the same size as the viewport. 
Code 22 - GXSetScissor 


GXSetScissor ( 
u32 left, 
u32 top, 
u32 width, 
u32 height); 


5.5 Coordinate systems 


The GX API assumes a right-handed coordinate system for model and eye space. The eye in eye space is 
assumed to be looking down the negative Z axis. In order to map eye space into the viewport, the coordi- 
nates undergo two changes of coordinate systems. The first occurs during projection into clip space 
(assuming the MTX library projection routines are used). As a result of this transformation, the Z axis is 
flipped, with Znear mapped to -W and Zfar mapped to zero. The second change occurs during the viewport 
mapping into screen space. In this mapping, the Y axis is flipped, and the Z values are offset such that 
Znear maps to the viewport near Z and Zfar maps to the viewport far Z. The diagram below illustrates the 
process: 


Figure 15 - Coordinate system transformations 
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The function GXProject is provided to transform a single point from object space to screen space. You 
pass it the object coordinate, the model-view matrix, the projection matrix, and the viewport. It returns the 
transformed point: 


Code 23 - GXProject 


void GXProject ( 


£32 x; // object coordinates 

£32 ¥; 

£32: i, 

£32 mtx[3] [4], // model-view matrix 

£32* pm, // projection matrix, as returned by GXGetProjectionv 
£32* vp, // viewport, as returned by GXGetViewportv 

£32* sx, // veturned screen coordinates 

£32" SY; 

£32* sz. ()3 


5.6 How to override the default matrix memory configuration 


The GX API configures the matrix memory in a way that is generally useful for a wide variety of applica- 
tions. In some specific cases, you may want to override this default configuration. Here we describe the 
physical layout of matrix memory and the rules an application must follow to allocate the matrix memory 
successfully. 


The matrix memory consists physically of two separate memories: one for modelview and texture matrices, 
the other for normal matrices. (HW2 adds a third memory; see "15 GX updates for HW2" on page 155.) 


Figure 16 - Matrix memory 
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The modelview/texture matrix memory consists of 64 rows, each row consisting of four floats. A matrix is 
loaded as a set of contiguous rows in matrix memory. The index used by GXLoadPosMtx* and 
GXLoadTexMtx* to specify the matrix in matrix memory (GX_PNMTXO or GX_TEXMTX5, for example) is 
actually the row address where the first row of the matrix is loaded. Modelview matrices loaded using 
GXLoadPosMtx* are assumed to have three rows (i.e., 3x4 matrices). Texture matrices can have either 
two or three rows, specified in GXSet TexCoordGen. The matrices may be loaded starting at any row 
address. 
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The normal matrix memory consists of 32 rows, but each row contains only three floats. Since normals are 
not required to be homogeneous, only 3x3 matrices are needed. Normal matrices can be loaded from 
either a 3x4 matrix (GXLoadNrmMtxImm) or from a 3x3 matrix (GXLoadNrmMtxImm3x3) in main memory. 
Like modelview and texture matrices, normal matrices are indexed using the starting row address in nor- 
mal matrix memory. However, normal matrices use the same index as modelview matrix, so the two must 
be allocated as a pair. Normally, the matrix used to transform the normal is the inverse transpose of the 
modelview matrix, so they are naturally pairs. 


Since the modelview matrix index can address 64 rows, but the normal matrix index can address only 32 
rows, the hardware computes the normal index as: 


Equation 7 - Normal matrix index 
norm_mtx_indx = pos_mtx_indx % 32 


For example, a modelview matrix row address of 33 or 1 will address the same normal matrix (at row 
address 1). 


Note: If you wish to use modelview matrices beyond the first ten matrices (thirty rows), you need to leave 
a gap at row 30 and row 31. The eleventh modelview matrix must start at row 32 in order to line up 
with a normal matrix. 


The default matrix configuration organizes the modelview/texture matrix memory as follows: 
* Ten modelview matrices (3x4). 

* Ten texture matrices (3x4). 

* One identity matrix (3x4). 


Note: Sixty-three rows are used in this configuration. 
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6 Vertex lighting 


6.1. Lighting pipeline 


Nintendo GameCube supports lighting in hardware as a per-vertex calculation. This means that a color 
(RGB) value can be computed for every lit vertex, and that these colors are then linearly interpolated over 
the surface of each lit triangle (known as Gouraud shading). 


Nintendo GameCube has full support for diffuse local spotlights. There is also some support for infinite 
specular lighting. This chapter focuses mainly on diffuse lighting. Specular lighting is covered in "6.6 Spec- 


ular lighting" on page 49. 


6.1.1 Diffuse lights, diffuse attenuation and vertex normals 


The hardware supports diffuse attenuation. This means that the front of the object can be brighter than the 
sides, and the back darkest. Diffuse attenuation is the primary reason vertex normals are supported. For 
each vertex, the vertex normal (N) is compared against the vector between the vertex and light position. 


This encompasses two important physical effects. First, surfaces on 'the back' of an object receive no light. 
This can be seen as a simple self-shadowing technique—one which only works for convex objects. Sec- 
ond, surfaces facing the light are lit more or less, depending on the incident angle of the incoming light. 


6.1.2 Local lights and range attenuation 


The hardware supports /ocal lights. Local lights have a position within the world and, possibly, a direction. 
In fact, each light must have a position. Using the position of each vertex and the position of the light, the 
hardware can perform per-vertex distance attenuation. This means that you can make the brightness of 
the light shining on an object decrease as the object moves away from the light. 


6.1.3 Spotlights, directional lights and angle attenuation 


The hardware supports directional lights, ranging from non-directional lights, to subtle directional effects, to 
highly directional spotlights. These effects are supported by angle attenuation. This means that vertices 
directly “in the beam” of the light can be made brighter than vertices outside the beam or behind the light. 


Local diffuse lights can be both distance- and angle-attenuated (spotlights). By programming the proper 
lighting equation, you can obtain the attenuation value as an output color or alpha. This color or alpha can 
then be used in the Texture Environment (TEV) unit to attenuate projected texture lights. 


The hardware supports eight physical lights. The programmer can describe the attenuation parameters, 
position, direction, and color of each light. The programmer can control up to four physical color channels 
that accumulate the result of the lighting equation. By associating lights with channels, the programmer 
can choose to sum the effect of multiple lights per vertex, or combine them later in the TEV. The number of 
channels available to the TEV is set by GXSetNumChans. In some cases (e.g., when using a color channel 
to generate texture coordinates), a light channel is computed but not output. If only one channel is avail- 
able to the TEV, it is Gx_COLOROAO. The light channel GX_COLOR1A1 is only available if two channels are 
output to the TEV. 


Code 24 - GXSetNumChans 


GXSetNumChans( u8 nChans ); 


Each color channel allows the enabling of attenuation and the selection of the color source. A light mask 
associates up to eight lights with the channel. 
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One light is supported at the peak vertex rate of 25 million vertices/second. With two texture coordinates 
per vertex and two lights, the peak vertex rate is 18.3 million vertices/second. With two texture coordinates 
per vertex and four lights, the peak vertex rate is 12.5 million vertices/second. 


Figure 17 - Associating lights with color channels 
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6.2 Diffuse lighting equations 


Figure 18 - Lighting vectors 





Equation 8 - Light parameters 


t= {0 pe2.9 4 legp ly} ,ColorChannel 

j = {0,1,2,3,4,5,6,7} , Light 

i= LightParameter 

Equation 9 - Rasterized color 
ChannelColar, 
ChannelColar, 
Equation 10 - Color channel 


RasterizedColorycn, = 


ChannelColor, = Material, x LightPunc; 
Equation 11 - Material source 


Material, = (MateriaiSre, = GY _ SRC _ REG ? MaterialReg, : VertexColor) 
Equation 12 - Channel enable 


1.0, if LightPuncHnable, = FALSE 
LightFunc, = 
illum if LightFuncHnable, = TRUE 
Equation 13 - Sum of lights in a channel 


jn? 

illum, = cao am, + saneai 5 Lightifask, ( j) Aiten ; (7) Diffuse Atten, ( oot) 
jf 

Equation 14 - Ambient source 


Amb, = (AmbSre, = GY _SRC _REG ? AmbientReg, : VertexColor,) 
Equation 15 - Diffuse attenuation 


1.0, if DiffAttenSelect, = GX __DF _NONE 
Diffuse Aten, (j) = N-L,, if DiffAttenSelect, = GY _DF _SIGN 
Clamp0(N -£;), if DiffAttenSelect, = GX _DF _CLAMP 
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Equation 16 - Diffuse angle and distance attenuation 
Clamp Ola, AAtt (i) + a, AA, C7)+ ay,] 


ky di( 3)" + ky 2;( 7) + hy; 
AAtt,(j) = Clamp 0(L,-Ly,) li GX __AF _SPOT 


Atten,(j) = GX _ AF _NONE ?1: 


d,(j)= ji, 2; i G¥ __AF _SPOT 


6.3. Matrix memory 


To use per-vertex lighting, you must provide a normal with each vertex. In order for the hardware to trans- 
form the normal, you must provide a normal matrix. The normal matrix must be the inverse transpose of 
the modelview matrix. The position modelview and normal modelview matrix are indexed by a single index. 
In other words, they are considered to be a pair. The transformed normal is re-normalized before the light- 
ing computations. 


For directional lights, you must provide a light normal for each active light. It is the application’s responsibil- 
ity to transform the light normal and light position into view space when the viewpoint changes. The world- 
to-view matrix should be used to transform the light’s position. The light’s direction should be transformed 
by the inverse transpose of the world-to-view matrix. The light’s direction is not normalized by the hard- 
ware; therefore, the application must ensure it is properly normalized. 


6.4 Light parameters 


You may define the position, direction, attenuation factors, and color for each physical light. There are eight 
sets of physical light parameters. Light state is stored ina GXLightObj structure. The application is 
responsible for allocating the memory for a GXLightObj. The GXInitLight* functions can be used to 
initialize or modify the GXLightObj structure. The GXLoadLightObj Imm or GXLoadLightObjIndx 
function is used to load the GXLightObj parameters into a physical light. GkLoadLightObj Indx has 
cache-coherence issues; see "E.3 Data coherency" on page 214 for more details. 


6.4.1 Angle attenuation 


The function GXInitLightAttn is used to initialize parameters used to compute angle and distance 
attenuation, as shown in "Equation 9 - Rasterized color" on page 43. 


Code 25 - GXInitLightAttn 


void GXInitLightAttn ( 
GXLightObj *1lt_obj, 


£32 a0, 
£32 al, 
£32 a2, 
£32 kO, 
£32 kl, 
£32 k2 ); 


The angle attenuation (a0, al1,a2) is a quadratic function of the cosine of the angle between the light 
direction and the light to vertex direction. By controlling the quadratic function’s coefficients, you control the 
effective angle of the light. 
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A more convenient way of controlling the angle attenuation is provided by: 


Code 26 - GXInitLightSpot 


GXInitLightSpot ( 
GXLightObj* l1t_obj, 
£32 cutoff, 
GXSpotFn spot_fn ); 


This function defines two easy-to-control parameters, rather than a0, a7, a2 used by GXInitLightAttn. 
The parameter cutoff specifies cutoff angle of the spotlight in degrees. The spotlight works while the angle 
between the ray for a vertex and the light direction given by GXInitLightDir is smaller than this cutoff 
angle. The value for cutoff should be within (0.0 < cutoff <= 90.0), otherwise given light object doesn't 
become a spotlight. 


The parameter spot_fn defines type of the illumination distribution within cutoff angle. The following graphs 
show curve shape of the distribution functions given by acceptable values for spot_fn. The value 
GX_SP_OFF turns spotlight feature off even if the color channel setting is using GX_AF_ SPOT (see 
GXSetChanCtrl). 


Figure 19 - Spotlight functions 
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Note: This function sets parameters only for angular attenuation. Parameters for distance attenuation 
should be set using GXInitLightDistAttn. You can also use GXInitLightAttn, but you 
have to care about the order for calling these functions because GXInitLightAttn overwrites 
parameters for both angle and distance attenuation. 


6.4.2 Distance attenuation 


GxInitLightAttn can also be used to control the light’s distance attenuation characteristics. As shown 
in "Equation 9 - Rasterized color" on page 43, the distance attenuation is an inverse quadratic function of 
distance from the light to the vertex in world coordinates. By controlling the coefficients kO, k7, and k2 the 
attenuation function can be controlled. 
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A more convenient way to control the distance attenuation is provided by: 
Code 27 - GXInitLightDistAttn 


GXInitLightDistAttn ( 
GXLightObj *l1t_obj, 
£32 ref distance, 
£32 ref brightness, 


GXDistAttnFn dist func ); 


In this function, you can specify the brightness on a defined reference point. The parameter ref_distance is 
distance between the light and the reference point. The parameter ref_brightness specifies the ratio of the 
brightness at the reference point. The value for ref_distance should be greater than 0 and ref_brightness 
should be within O<ref_brightness<1, otherwise the distance attenuation feature is turned off. 


The parameter dist_func defines how brightness decreases as a function of distance. The following graphs 
show the curve shapes given by acceptable values for dist_func. The value GX_DA_ OFF turns distance 
attenuation feature off. 


Figure 20 - Distance attenuation functions 
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Note: This function sets parameters only for distance attenuation. The parameters for angle attenuation 
should be set using GXInitLightSpot. You can also use GXInitLightAttn, but you have to 
be careful about order when calling these functions because GXInitLightAttn overwrites 
parameters for both angle and distance attenuation. 
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6.5 Channel parameters 


6.5.1 Channel colors 


A vertex may include up to two colors, each having up to two channels: Color (R, G B) and Alpha (A). 
Taken together there are a total of four channels: color0, color1, alphaO, and alpha7. 


Each channel has an associated ambient color or alpha and a material color or alpha. These colors can 
come from vertex colors or from special ambient and material registers. The register colors are set using 
the functions: 


Code 28 - GXSetChanAmbColor 


void GXSetChanAmbColor ( 
GXChannelID chan, 
GxXColor amb color ); 

void GXSetChanMatColor ( 
GXChannelID chan, 
GxXColor mat_color ); 


Note: Only the components of GXxColor needed by the channel are set. Also, you may set both the color 
and alpha of a channel at the same time if this is more convenient. 

6.5.2 Channel control 

Each channel is controlled using the function: 

Code 29 - GXSetChanCtrl 





GxXSetChanCtrl ( 

GXChannelID chan, 
GXBool enable, 
GxColorSre amb_src, 
GxXColorSre mat_src, 
GXLightID light_mask, 
GXDiffuseFn diff fn, 
GxXAttnFn attn_fn ); 





If a lighting channel is disabled, enable = GX_DISABLE, the material color for that channel will be passed 
through unmodified to be rasterized. The mat_src parameter determines whether the material color comes 
from the vertex color or from the material register. 


When a channel is enabled, the lighting equation is computed for each light enabled in the light_mask. 


Note: The spot light enable, attn_fn, is defined as part of the channel, even though the angle attenuation 
parameters are part of the light description. Diffuse attenuation can be enabled using diff_fn. Nor- 
mally, when diff_fn is enabled, the GX_DF_CLAMP value is used. Sometimes it is useful to disable 
diffuse attenuation in order to pass distance attenuation directly to the TEV. 
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6.5.3 Pre-lighting 


Often, it is useful to pre-light an object in a modeling tool, such as 3D Studio MAX. The result of pre-lighting 
is usually captured in the vertex colors of the object. When the lighting channel is configured properly, you 
can combine hardware-computed local diffuse lighting with pre-lighting. One example, shown below, 
assumes the pre-lit color is G<_VA_CLRO per-vertex. The equation we want to implement is: 


Equation 17 - Pre-lighting 


Gt clr = pre_fit _cir* (amb _scale+ diff _scale* other_atin* diff kt clr) 


amb scale+ diff _scale =1.0 


When no local diffuse light is shining on an object, the color is equal to the ambient pre-lit color which is 
(pre_lit_clr*amb_scale). When a light is shining on the object, the percentage of pre-lit color is increased 
until, where the light is the brightest, the full value of pre-lit color is used. 


The following example sets up GX_COLORO channel for pre-lighting. The vertex color0 will be the pre-lit 
color at full intensity. The ambient color is set to white and scaled so that with no lighting the vertex color 
will be 25% of the pre-lit color. The diffuse light color is scaled so that when fully lit, the vertex color will 
equal 100% of the pre-lit color. 


Code 30 - Pre-lighting API 


// init light position and direction and dist atten 
GXInitLightColor ( 

Lt_obj, 

ScaleColor(myLitColor, 0.75) ); // diffuse scale 
GXInitLightSpot ( 

Lt_Obj, 

30.0, 

GX_SP_COS2 ); 

GXLoadLightObj Imm ( 

Lt_obj, 

GX_LIGHTO ); 
GxSetChanAmbColor ( 

GX_COLORO, 

ScaleColor (White, 0.25)); // ambient scale 
GxXSetChanCtrl ( 

GX_COLORO, 

GX_ENABLE, 

GX_SRC_REG, // ambient color source 

GX_SRC_VTX, // material color source (pre-lit color) 
GX_LIGHTO, 

GX_DF_ CLAMP, 

GX_AF SPOT ); 
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6.6 Specular lighting 


The GP supports the computation of specular lighting. Specularity is actually a surface property that is 
commonly equated with “shininess.” Specular highlights result when a surface is angled such that it reflects 
the light from the light source toward the eye point. Specular lighting is implemented by creating an infinite 
specular light source and modifying the angle attenuation control appropriately. In the equations below, “H” 
refers to the half-angle between the vectors to the light and to the eye, as shown in the figure below: 


Figure 21 - Specular lighting vectors 


Equation 18 - Specular attenuation 
“2 : 
Clamp Ofe,, AAI (7) + a, ,AALt, Cit ay, } 
ky diz) + ky ,2;(7) + ko; 
AAtt (j)= NE > 0? Clamp0(N-H,): 04 GX _AF _SPEC 


a(f=N-L > 0? Clamp 0(N.H,):04 GY _AF_SPEC 
One may specify a specular light in the GxSetChanCtr1 function by setting the GXAttnFn parameter to 
GX_AF_ SPECULAR. 


Atten,(j) = GX __ AF _NONE?1: 





Notes: 


¢ Since specular light sources are infinite, distance attenuation does not apply to them. You specify 
the specular light direction using GXInitSpecularDir. It will compute and store the half-angle 
and light direction. You may also specify the half-angle directly using the function 
GXInitSpecularDirHA. 


¢ Setting a light this way overwrites the position and direction from a diffuse light source. 
« You should not use GXInitLightDir or GXInitLightPos with a specular light source. 


A specular light’s half-angle is stored in the “light direction” field of the light object, while the specular light 
direction is stored in the “light position” field of the light object. The specular light direction is first multiplied 
by 220 before being stored. This factor does not affect the specular computation, since the direction is 
normalized first, and the factor allows the same light object to be used as a diffuse, non-directional light 
source (when used with a different channel). 


Note: The specular-computed light and the diffuse-computed light can only be combined in the TEV. You 
can use the macro GXInitLightShininess to control the specular attenuation function: 


Code 31 - GXInitLightShininess() 


GXInitLightShininess(lt_obj, shininess) ; 
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This macro effectively controls how sharp the specular highlight appears on the lit surface. It sets both the 
distance and angle attenuation coefficients. Care should be taken when using this macro with other func- 
tions that set the attenuation coefficients, such as GXInitLightAttn, GXInitLightAttnA, 
GXInitLightAttnk, GXInitLightDistAttn, and GXInitLightSpot. 


A plot of the shininess attenuation function is shown below for various values of shininess s. The plot 
shows that higher values of shininess result in a sharper fall-off in the attenuation function. Minimum 


attenuation occurs when i¥-# is +/- 1.0. 


Figure 22 - GXInitLightShininess values 
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6.7 Vertex performance 


"Table 2 - Vertex performance" on page 51 lists the peak vertex data rate for various combinations of ver- 
tex data and lighting. Actual performance depends on a number of game-specific variables and should be 
determined empirically (see "6.8 Lighting performance" on page 51 for an example). This table takes into 
account transform, lighting, and setup performance. 


¢ #T: Number of (s, ¢) textures. 

¢« #PT: Number of (s, ¢, q) projected textures. 

¢ #L: Number of local diffuse or specular lights. 
¢ #BM: Number of generated bump maps. 


¢  #C: Host-supplied color. 
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Table 2 - Vertex performance 


Vertex Mode Performance 









































1C Vertex 32.4MV/s 
1T Vertex 27.0MV/s 
[1 PT] [1 L] Vertex 20.3MV/s 
2T 1L Vertex 14.7MV/s 
3T iL Vertex 11.6MV/s 
3PT 1L Vertex 11.6MV/s 
2T 2L Vertex 14.7MV/s 
2T 4L Vertex 9.5MV/s 
4PT 4L Vertex 9.5MV/s 
8T 4L Vertex 5.6MV/s 
8PT 4L Vertex 5.6MV/s 
2T 1BM 1L Vertex 9.5MV/s 
3T 2BM 1L Vertex 5.6MV/s 
3T 2BM 4L Vertex 4.0MV/s 














6.8 Lighting performance 


In hardware, the lighting pipeline computes three components per light. If a color channel is being com- 
puted, the components correspond to R/G/B. If an alpha channel is being computed, the components cor- 
respond to A/A/A. Each light costs four cycles (at 162MHZ) for all the lights that will contribute to a channel, 
plus one cycle. The discussion below applies only to local diffuse lighting (i.e., no texture coordinate gener- 
ation, bump mapping, etc.). It is assumed that the vertex supplies only a color for each active channel. 


For example, if channel GX_COLORO uses two lights and channel GX_ALPHAO uses one light, then the per- 
formance is 2*4 + 1*4 + 1 = 13 cycles, or 162/13 million vertices/second (~12.5 million vertices/second). If 
channel 0 (GX_COLORO and GX_ALPHAO) uses three lights and channel 1 (GX_COLOR1 and GX_ALPHA1) 

uses four lights, the total performance is 3*4 + 4*4 + 1 = 27 cycles, or 162/27 million vertices/second (~6.0 
million vertices/second). 


You can use the performance counter function described in "14 Performance metrics" on page 147 to 
determine the number of Graphics Processor clocks required per vertex, given the current lighting and tex- 
ture coordinate generation settings. You can also use the online GX calculator, titled Vertex Performance 
Calculator, in the Dolphin Reference Manual (HTML). 


Code 32 - Setting lighting controls and texture coordinate generation 


// set lighting controls and texture coordinate generation first 
clks per vtx = GxXPerfMetric (GX _PERF_CLKSPERVERTEX) ; 

clks_ per vtx = GXPerfMetric(GX_PERF_ NONE) ; 

OSReport ("M Vtx/sec = %f\n”, 200.0f/clks_ per vtx); 
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7 Texture coordinate generation 


7.1. Specifying texgens 


The GP has many ways to generate texture coordinates. This section briefly describes the major aspects 
of the GXSet TexCoordGen function. For details on applications of texture coordinate generation, refer to 
the Advanced Rendering section in this guide. Also, see "15 GX updates for HW2" on page 155 for the 
additional texture generation features of HW2. 


Code 33 - GXSetTexCoordGen 


GxXSetTexCoordGen ( 
GXTexCoordID dst_coord, 
GXTexGenType func, 
GxTexGenSrec src_param, 
GXTexMtx mtx ); 


The texture coordinate generation function takes this general form: 
Equation 19 - Texture coordinate generation 
dst_coord = func(src_param, mtx) 


The input data described by the current vertex descriptor is transformed into a texture coordinate. The 
most common function of texture coordinate generation is to transform the src_param by a 2x4 or 3x4 tex- 
ture matrix, mtx. In this case, func is set to either GX_TG_MTX2x4 or GX_TG MTX3x4. 


Equation 20 - Transforming src_param by 2x4 and 3x4 matrices 
(s,t) = [GX_TEXMTX*][InputCoord] // GX_TG_MTX 2x4 
or 
(s,t,q) = [GX_TEXMTX*][InputCoord] || GX_TG_MTX 3x4 

Input coordinates, src_param, are one of: 


Equation 21 - Input coordinates 


(s, ¢, 1.0 1.0) GY _TG_TEy* 

(x y z 10) WGY_TG_PoOSs 

in, 2, ‘ 1.0) #G¥_TG_NRM 

oe b, b, 1.0) 4G¥_TG_BINRM 

(:, t, 1.0) GX _TG_TANGENT 

Note: ve coordinates for GX_TG MTX2x4 and GX_TG MTX3x4 functions are the untransformed ver- 


tex data. 


At a minimum, the GP always transforms input texture coordinates by a matrix. By default, the available 
texture matrices are described by the enumeration GXTexMt x. To pass input texture coordinates 
unchanged into output coordinates, use the GX_ IDENTITY matrix. You must always generate a consecu- 
tive number of texture coordinates starting at G<_TEXCOORDO. 
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In addition to transforming texture coordinates, you can also transform positions and normals to create out- 
put texture coordinates. Transforming an input texture coordinate using a 2x4 matrix is useful for transla- 
tion and rotation effects. A 3x4 matrix is useful for projecting textures and reflection mapping. 


Note: You can change the order of output texture coordinates from the input coordinate order using the 
GXSetTexCoordGen function. In addition, you can use a single input parameter to generate mul- 
tiple texture coordinates. You can also ignore certain input parameters. This allows you to turn var- 
ious layers of texture on and off without needing to build a new display list for each version (at the 
cost of sending vertex data you don’t use). 


The bump mapping function (func = Gx_TG_BUMP*) of texture coordinate generation supports the 
embossing style of bump mapping. This style of bump mapping is useful when the surface geometry of an 
object is being animated. Texture coordinate generation carried out with the bump mapping function also 
affects the vertex lighting hardware. A maximum of three GX_TG BUMP* texture coordinates are supported 
simultaneously. See the Advanced Rendering section for more details on embossed bump mapping. 


Texture coordinates can also be generated from the red and green components of a particular lighting 
channel (func = GX_TG_SRTG). These can be used to create “cartoon” lighting functions, with sharp transi- 
tions between arbitrary colors. Normally, the red component represents the intensity function of a single 
local diffuse light. The green channel is programmed as a material control. The red channel is mapped to 
the s-coordinate. The green channel is mapped to the f-coordinate. The s-coordinate (the diffuse light 
intensity) is mapped by an arbitrary 1D texture lookup into a color. The t-coordinate can be used to select 
which 1D table to use from an array of tables (i.e., a 2D texture). A maximum of two texture coordinates 
may be generated using GX_TG SRTG. The first one must be used with color channel 0, while the second 
can only be used with color channel 1. See the Advanced Rendering section for more details on cartoon 
lighting. 


Generated texture coordinates must be sorted by function. That is, texture coordinates generated by sim- 
ple transforms should occur first, followed by bump map coordinate generation, and finally the generation 
of any texture coordinates based on lighting results. 


Table 3 - Texture coordinate generation order 








SX-Xe [Vil n-\o MO) ae(-1g WK-y qcolc-Mexeloycelfar-licMel-Val-le-liolimmelareicela 
First GX TG MTX2x4, GX TG MTX3x4 
Next GX_TG BUMPO-7 
Last GX_TG SRTG (maximum of 2 texture coordinates gener- 
ated) 














In addition to setting the texture-coordinate generation functions, one must also specify how many coordi- 
nates are being generated. This is performed by the following function: 


Code 34 - GXSetNumTexGens 


GxXSetNumTexGens( u8 nTexGens ); 


The default number of texgens (set in GXInit) is one. If no vertex lighting is enabled 
(GXSetNumChans (0) ), then at least one texture coordinate must be generated. If at least one channel is 
being lit (GXSetNumChans (1) ), then the number of texgens can be set to zero. 
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7.2 Other texture coordinate generation issues 


In the GP, texture coordinates must be scaled to the size of the texture to which they are being applied. 
Normally, this is taken care of automatically by GX. However, you can also control the texture coordinate 
scaling manually by calling this function: 


Code 35 - GXSetTexCoordScaleManually 


void GXSetTexCoordScaleManually(GXTexCoordID coord, GXBool enable, ul6 ss, ul6 ts) 


One application of this function is discussed in "10 Indirect texture mapping" on page 103. 


The GP also has a feature that allows cylindrical geometry to be wrapped with a texture without having to 
specify different texture coordinates for the seam vertices. This is known as cylindrical texture wrapping, 
and it can be enabled with the following function: 


Code 36 - GXSetTexCoordCylWrap 


void GXSetTexCoordCylWrap (GXTexCoordID coord, GXBool s_ enable, GXBool t_enable) 


In effect, this function computes the spread between each vertex’s texture coordinate and the minimum 
coordinate, then subtracts 1 from any coordinate where the spread exceeds 0.5. 


7.3. Texture coordinate generation performance 


The functions that are based on lighting, G<_TG BUMP* and GX_TG_SRTG, exhibit performance similar to 
lights (see "6 Vertex lighting” on page 41 for vertex lighting performance details). Each GxX_TG SRTG is 
equivalent to a light, or four clocks for each coordinate generated plus one clock. Bump mapping is equiv- 
alent to two lights, or eight clocks for each coordinate generated plus one clock. 


The GX_MTX_2x4 and GX_MTX_ 3x4 functions take three clocks per component. For the special case of a 
single (s, t) texture coordinate transform, only two clocks are required. 


In general, the actual vertex performance is a function of lighting, texture coordinate generation, input data, 
and output data parameters. To compute the performance for a particular configuration, use the Vertex 
Performance Worksheet in the Performance section of the GX API pages in the Dolphin Reference Manual 
(HTML). You may also query the hardware to determine the current target vertex performance using the 


GXPerfMetric facility (see "Code 32 - Setting lighting controls and texture coordinate generation" on 
page 51). 
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8 Texture mapping 


Nintendo GameCube has powerful texture mapping features, including single-cycle mipmapping, com- 
pressed color texture and color index textures, anisotropic texture filtering, multitexture support (in multiple 
cycles), indirect textures, and Z textures. The graphics processor (GP) textures four pixels per clock at a 
162 MHz clock speed, for a peak mipmapped texture rate of 648 million pixels/second. 


A flexible 1MB Texture Memory (TMEM) can be configured as multiple texture caches, color index lookup 
tables (TLUTs), or it can be preloaded with textures. Texture caches can automatically prefetch texels from 
main memory as needed. The TMEM is an embedded high-speed DRAM and is physically separate from 
the embedded frame buffer (EFB). Textures are prefetched in parallel with rendering. 


The GP can be configured to Z-buffer before texturing, eliminating texture fetches for pixels that are not 
visible. 


The texture hardware performs perspective correction, Level of Detail (LOD), and coordinate operations 
like clamping, repeating, or mirroring at the rate of 4 pixels per clock per image. To support multitexturing, 
the state describing up to 8 active textures is stored in the GP. Multitexturing is accomplished by process- 
ing a quad of pixels (2x2) through the pipeline over multiple cycles. 


© 2006 Nintendo RVL-06-0037-001-A 
CONFIDENTIAL Released: February 27, 2006 


58 Graphics Library (GX) 


GXInit provides a default configuration of the texture pipeline that abstracts away many of the more com- 
plex texture features. (You can override this configuration easily, as described later in this chapter and in "9_ 
Texture environment (TEV)" on page 89.) The following example shows a simple program that loads and 
uses a texture. 


8.1 Example: Drawing a textured triangle 


Code 37 - Simple texture example 


Project: Dolphin 
File: example7.l.c 


#include <demo.h> 


#define BALL64 TEX _ID 8 

/* a * 
Model Data 

*#--- a */ 


static s8 Vert_s8[] ATTRIBUTE ALIGN(32) = 
{ 
-100, 100, 0, // 0 
100, 100, 0, // 1 
-100, -100, 0 // 2 
}; 


static u32 Colors u32[] ATTRIBUTE ALIGN (32) = 
{ 
// rgba 
Oxff0000ff£, // 0 
OxO0ffO0ff, // 1 
Ox0000fff£ // 2 
}; 


// Array of texture coordinates 

static u8 TexCoords u8[] ATTRIBUTE ALIGN(32) = 

{ 

// s E: fixed point format is unsigned 8.0 
0x00, 0x00, // 0 
0x01, 0x00, // 1 
0x00, Ox01l // 2 


static void CameraInit( Mtx v ); 


/* Safe ais Gmrs Gosia: Sa Gs ca Gs Gave fase ni ns sai asi as sat Stason asec ans seas: Sana sis Pg ig ns Sains Hai an Ss ag tape) tat a t- ana-l * 
Application main loop 
SA sf Game’ Gs faves fangs Snecma ansaid nia apni fash Gauss mn msm a aan vn hy SE QS ane ee */ 
void main ( void ) 
{ 
PADStatus pad[4]; // Controller state 
GXTex0Obj texObj; // texture object 
Mtx Vv; // view matrix 
us a // loop variable 
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TexPalettePtr tpl = 0; 
pad[0].button = 0; 
DEMOInit (NULL) ; // Init os, pad, gx, vi 


CameralInit(v); 
GXLoadPosMtxImm(v, GX_PNMTX0) ; 


GXSetNumChans(1); // Enable light channel; by default = vertex color 


GxXClearVtxDesc(); 

GXSetVtxDesc(GX_VA_POS, GX_INDEX8); 

GXSetVtxDesc (GX_VA_CLRO, GX_INDEX8); 

// Add an indexed texture coordinate to the vertex description 
GXSetVtxDesc (GX_VA_TEX0, GX_INDEX8) ; 


GXSetArray(GX_VA_POS, Vert_s8, 3*sizeof(s8)); 
GXSetArray(GX_VA_CLRO, Colors u32, 1*sizeof(u32)); 
GXSetArray(GX_VA_TEXO, TexCoords_u8, 2*sizeof(u8)); 


GXSetVtxAttrFmt (GX_VTXFMTO, GX_VA_POS, GX_POS XYZ, GxX_S8, 
GXSetVtxAttrFmt (GX _VTXFMTO, GX_VA_CLRO, GX_CLR_RGBA, GX_RGBA8, 
// Describe the texture coordinate format 

GXSetVtxAttrFmt (GX _VTXFMTO, GX_VA_TEXO, GX_TEX_ST, GxX_U8, 


// Load the texture palette 

TEXGetTexPalette(&tpl, "gxTextures.tpl"); 

// Initialize a texture object to contain the correct texture 
TEXGetTexObjFromPalette (tpl, &texObj, BALL64 TEX ID); 

// Load the texture object, tex0 is used in stage 0 
GXLoadTexObj (&texObj, GX_TEXMAPO) ; 





// Set the Texture Environment (Tev) Mode for stage 0 
// GXInit sets default of 1 TexCoordGen 
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// Default TexCoordGen is texcoord(n) from tex(n) with 2x4 identity mtx 


// Default number of tev stages is 1 

// Default stageO uses texcoord0, texmap0, color0a0 
// Only need to change the tevop 

GXSetTevOp (GX_TEVSTAGEO, GX_MODULATE) ; 


while (!pad[0] .button) 
{ 
DEMOBeforeRender () ; 
// draw a triangle 
GXBegin(GX_TRIANGLES, GX_VTXFMTO, 3) 
for (i = 0; i < 3; itt) 
{ 
GXPosition1x8 (i); 
GXColor1x8 (i); 
// Add texture coordinate index 
GXTexCoord1x8 (i); 
} 
GXEnd(); 
DEMODoneRender () 7 
PADRead (pad) ; 
} 
OSHalt ("End of demo"); 


static void CameraInit ( Mtx v ) 
{ 


Mtx44 p; 
Vec camPt = {0.0F, 0.0F, 800.0F}; 
Vec at = {0.0F, 0.0F, -100.0F}; 
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Vec up = {0.0F, 1.0F, 0.0F}; 


MIXFrustum(p, 240.0F,-240.0PF,-320.0F, 320.0F, 500, -2000); 
GXSetProjection(p, GX PERSPECTIVE) ; 
MTXLookAt(v, &camPt, &up, &at); 


The GX API requires the following basic steps to use texturing: 
1. Load textures into main memory. 


2. Allocate a texture object (GXTexObj) structure for each texture. 
3. Initialize the texture object (GXInitTex0b 7) to describe the texture. 
4. Load a texture object (GXLoadTex0bj) into the GP to activate the texture. 


For color index textures, you should perform the following additional steps: 
1. Load TLUTs into main memory. 


2. Allocate a TLUT object (GXT1lut0Obj) structure for each TLUT. 

3. Initialize the TLUT object (GXInitTlut0Obj) to describe the TLUT. 

4. Load the TLUT (GXLoadT1lut) into one of the named TLUT regions of texture memory (GXTluts). 
5 


Associate a TLUT name with a texture when initializing the color-index texture object 
(GXInitTexObjCI). 


These steps are explained in more detail in the sections that follow. 


8.2 Loading a texture into main memory 


The first step in using a texture is to load it into main memory from the optical disc. The software released 
with the Nintendo GameCube development system includes a program called TexConv. exe for convert- 
ing common images files to a “Texture Palette” (TPL, extension .tp1) format. TPL files can be stored on 
disc and loaded using the texPalette library. This library will load a TPL file and return a texture object that 
can be used by the GX API. These utilities are described further in the Nintendo GameCube Character 
Pipeline and CG Tools Guide. 

8.3. Code describing a texture object 


The GX API uses a GXTexObj structure to describe the various parameters associated with a texture, and 
it is the user’s responsibility to allocate the memory for this structure. Parameters include: 


* A pointer (aligned to 32B) to the texture image data. 
« The texture’s format. 

¢ The width and height of the texture. 

« The texture filter modes. 


« The texture wrapping controls. 
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The user initializes or changes a GXTexObj using GXInitTex0bj (for non-color index textures) or 
GXInitTexObjCI (for color-index textures). 


Code 38 - Initializing or changing a texture object 


GXInitTex0Ob}j ( 
GXTexObj* obj, 
void* image ptr, 
ul6é width, 
ul6é height, 
GXTexFormats format, 
GXTexWrapModes wrap _s, 
GXTexWrapModes wrap t, 
GXBool mipmap ); 
GXInitTexObjCI ( 
GXTex0Obj* obj, 
void* image ptr, 
ul6é width, 
ul6é height, 
GXCITexFmt format, 
GXTexWrapModes wrap _s, 
GXTexWrapModes wrap t, 
GXBool mipmap, 
u32 tlut_name ); 


Additional mipmap controls (which are set to default values by the Gx InitTexObj* functions listed 


above) can be set using GXInitTexObjLOD. The GXInitTex0Obj* functions pre-compile their parame- 
ters into hardware register state settings for optimum performance. The GXTex0Obj structure is used to 
store these pre-compiled values and so are not directly readable by the application. Instead, you should 
use the GxGet* functions to read the contents of a GXTexObj. 


8.3.1 Texel formats 


The parameter format in GXInitTexObj and GXInitTexObjCI sets the texture format. The table below 
lists the possible formats. 


Note: GXInitTexObjCI should only set color-index formats and GXInitTex0b j should only set non- 
color-index texture formats. 


© 2006 Nintendo 
CONFIDENTIAL 


RVL-06-0037-001-A 
Released: February 27, 2006 


62 


Table 4 - Texel formats 


Texture Format 
Name 


Description 


Graphics Library (GX) 


Mipmap Filter Modes 









































GX_TF_l4 Intensity 4 bit all 
GX_TF_I8 Intensity 8 bit all 
GX_TF_IA4 Intensity + Alpha 8 bit (4+4) all 
GX_TF_IA8 Intensity + Alpha 16 bit (8+8) all 
GX_TF_C4 Color Index 4 bit LIN_MIP_NEAR 
GX_TF_C8 Color Index 8 bit LIN_MIP_NEAR 
GX_TF_C14X2 Color Index 16 bit (14b index) LIN_MIP_NEAR 
GX_TF_RGB565 RGB 16 bit (565) all 
GX_TF_RGB5A3 When MSB = 1, RGB555 format (opaque) all 

When MSB = 0, RGBA4443 format (transpar- 

ent) 
GX_TF_RGBA8 RGBA 32 bit (8888) all 
GX_TF_CMPR Compressed 4 bits/texel, ~RGB8A1 all 








The texture filter always processes 4-channel colors, RGBA, where each channel is 8 bits wide. In general, 
when the size of the texel component is less than 8 bits, the most significant bits (MSBs) of the texel are 
copied into the least significant bits (LSBs) of the color channel: 


Code 39 - Texture component promotion to 8 bits 


Input Texel = 0xa5a5 (RGB565 format) 
Hex Binary 
0x14, 10100 
Oxa5, 10100 101 


Input Channel Red 
Filter Channel Red 


Input Channel Grn 
Filter Channel Grn 


Ox2a, 101101 
Oxb6, 101101_10 


Input Channel Blu 
Filter Channel Blu 


0x05, 00101 
0x29, 00101 001 











This method guarantees that the entire color range from 0 to 255 is utilized. 


Note: For the intensity formats, the intensity component is copied to all four of the RGBA channels. For 
the intensity/alpha texels, the intensity value of the texel is copied into only the RGB channels. For 
the RGB texture formats, the alpha channel is set to opaque (A = Oxff). 
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RGB5AS3 is a 16-bit format that uses the MSB of each texel to indicate if the texel is opaque (MSB == 1)or 
transparent (MSB == 0). When the texel is opaque, the remaining 15 bits are assumed to be in a 5/5/5 
RGB format. When the texel is transparent the format is assumed to be 4/4/4/3 RGBA. The 8-bit alpha 
channel is formed as described above; the three bits of alpha are replicated starting at the MSBs until an 8- 
bit field is created. This format allows eight equal levels of transparency from fully transparent (Alpha = 
0x00), to fully opaque (Alpha = Oxff). 


Note: This format has two ways to represent opaque: when the MSB is 1, or when the MSB is 0 and the 
three bits of alpha are all 1’s. 


Compressed textures (GX_TF_CMPR) are stored as such in TMEM. Decompression occurs after texture 
lookup and before filtering. This results in storage savings in TMEM and main memory, and lower main 
memory to TMEM bandwidth requirements. 


8.3.2 Texture Lookup Table (TLUT) formats 
The table below lists the possible TLUT formats. There is no support for 24-bit or 32-bit TLUT formats. 
Table 5 - TLUT formats 


TLUT Format Name |DY=X-Yeq alo) dfoy a} 








GX_TL_IA8 Intensity + Alpha 16-bit (81 + 8A) 

GX_TL_RGB565 RGB 16-bit (R5 + G6 + BS) 

GX_TL_RGB5A3 When MSB = 1, RGB555 format (opaque) 
When MSB = 0, RGBA4443 format (transparent) 











The color index lookup occurs before filtering. The color that is looked up from the TLUT is converted into 
an 8-bit per-component (RGBA) format for filtering in the same manner described in the previous section. 
8.3.3 Texture image formats 

Textures are stored in main memory as a row-column matrix of tiles with the following attributes: 

* Each tile is a small sub-rectangle from the texture image. 

« Each tile is a 4x4, 8x4, or 8x8 texel rectangle. 

* Each tile is 32B, corresponding to the texture cache line size. 

« Textures must be aligned to 32B in main memory and be a multiple of 32B in size. 

* Texels are packed differently within a 32B tile depending on their type. 


The level of detail maps for a mipmap are stored together one after another in main memory. The mipmap 
image data is referenced by the base pointer of LOD 0. The size of a mipmap refers to the size of the LOD 
0 map. Mipmaps must be a power of 2 texels in width and height but not necessarily square. All the other 
LOD addresses can be calculated by the hardware from size and format information, and from the LOD 0 
base pointer. 


See Appendix C for texture formatting details. The texture conversion tool, TexConv.exe, can convert 
from common image formats to all the Nintendo GameCube formats. See the Nintendo GameCube Char- 
acter Pipeline and CG Tools Guide for more details on this tool. 
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8.3.4 Texture coordinate space 


Input texture coordinates (those supplied by the application to the GP) are map-relative (normalized) coor- 
dinates. The following diagram shows how to map a single texture image onto a quad: 


Figure 23 - Map-relative texture coordinates 


Quad Primitive 


(0.0, 0.0) (1.0, 0.0) 





Textured Quad 


(0, 0) (1.0*m, 0) 














(0.0, 1.0) (1.0, 1.0) > 


Texture (any size) 

















(0, 0) (m, 0) (0, 1.0*n) (1.0*m, 1.0*n) 




















(0, n) (m, n) 


The GP will scale the input map-relative coordinates into texel coordinate space by multiplying the s-coor- 
dinate by the texture width and by multiplying the t-coordinate by the texture height. 


Note: Each texture coordinate is associated with a texture map using the GXSetTevOrder function 


described in "9 Texture environment (TEV)" on page 89. 


The texel coordinate space ranges between +/-64K-texels. The largest texture image size is 1K by 1K-tex- 
els. The maximum texel coordinate extent across a primitive is 128K texels regardless of the image size. 
This implies a maximum number of repeats that can occur across a single primitive that depends on the 
texture size. For example, a 1K-texel image can repeat a maximum of 128 times across a single primitive. 


The parameters wrap_s and wrap_t of the GXInitTexObj and GXInitTex0ObjCI functions control the 
texture coordinate operation. Texture coordinates can be operated on independently in one of three ways: 
either clamped (GX_CLAMP), repeated (GX_ REPEAT), or mirrored (GX_MIRROR). 














When clamping (GX_CLAMP), the texture coordinate is clamped within the bounds of the image. If the s- 
coordinate is negative, column 0 of the image is used. If the s-coordinate exceeds the width of the image, 
then the last column of the image is used. This clamping operation also occurs independently for the f- 
coordinate using the height of the image. When mipmapping, the width and height of the image should be 
powers of two, but the image does not have to be square. 
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Texture borders are sometimes used to piece together larger textures from smaller textures (tiled). The 
border area contains texels from the adjacent texture(s) so that filtering will use the correct information. 
Nintendo GameCube has no support for texture borders. For planar (non-mipmapped) images, the border 
can be included in the image itself because the GP supports arbitrary image width and height. Mipmapped 
images do not support borders, so they cannot be tiled without visible seams (unless borders are carefully 
designed into the textures themselves). 





When repeating (GX_REPEAT), the (s, t) coordinates are modulo’d by the width/height of the image. 
Repeating is only valid for power of 2 image sizes. You can repeat textures to replicate a small texture over 
a large surface. 











When mirroring (GX_MIRROR), the (s, f) coordinates are modulo’d by the width/height of the image, similar 
to GX_REPEAT. In addition, the coordinates are 1’s complemented on every other wrap. Mirroring a texture 
is useful when an image is symmetrical about the (s, t) axis, like a tree texture. Mirroring is also useful for 
eliminating the seams that naturally occur when GX_REPEAT-ing a small texture. 


























The figures below illustrate the coordinate operations for a texture that is four texels wide for both linear 
and nearest filters. 


Note: When using linear filtering, a 2 texel offset is subtracted to ensure proper spatial alignment of mip- 






































map levels. 
Figure 24 - Linear filter—clamp, repeat, mirror 
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Figure 25 - Nearest filter—clamp, repeat, mirror 
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8.3.5 Filter modes and LOD controls 


Filter modes and LOD controls are set to default values based on the state of the mipmap flag in the 
GXInitTexObj and GXInitTexObjCI functions. These default values can be overridden by a subse- 
quent call to Gx InitTexObjLOD. The parameters described in this section are set using the 
GXInitTexOb 4 LOD function. 


Code 40 - GXinitTexObjLOD 


GXInitTex0ObjLOD ( 
GXTexObj* obj, 
GXTexFilters min_filt, 
GXTexFilters mag _filt, 


£32 min_lod, 
£32 max_lod, 
£32 lod_bias, 
GXBool bias_clamp, 
GXBool do_edge_ lod, 


GXAnisotropy max_aniso ); 


Figure 26 - Pixel projected in texture space example 


texture 1 
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view plane 


In the example above, the road texture is projected onto the view plane such that a square pixel on the 
screen maps to an elongated quadrilateral in texture space. The following discussion will reference this 
image of the pixel projected in texture space. 
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The level-of-detail calculation computes the effective texel-to-pixel ratio for a quad (2x2) of pixels. A log 
base 2 function converts the ratio into a corresponding mipmap level and a fraction, called LOD. Negative 
LOD indicates magnification, while positive LOD indicates minification. There are two methods of comput- 
ing LOD. By setting do_edge_lod to GX_ TRUE, LOD is computed using the distance in (s, t) between aaja- 
cent pixels in the quad. If you set do_edge_lod to GX_ FALSE, LOD is computed using the distance in (s, t) 
between diagonal pixels in the quad. 


Figure 27 - LOD calculation 
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In other words, the LOD calculation assumes that a pixel projects to a square in texture space. This is only 
true if the polygon is roughly facing the viewer. The square filter pattern used by normal mipmapping is 
called an isotropic (uniform shape) filter. When the polygon to be rendered is oblique to the viewer, the 
pixel projected into texture space is distorted into a quadrilateral. In this case, the LOD calculation makes 
the square encompass the quadrilateral, resulting in excessive blurring in one of the coordinate directions. 








There are two methods available in the GP for improving the overblurring behavior of the LOD computa- 
tion: LOD biasing and anisotropic filtering. 
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The computed LOD value can be adjusted using the /lod_bias parameter of the GXInitTex0ObjLOD func- 
tion. Lod_bias can be used to prevent texture from becoming too blurry due to the conservative nature of 
the LOD calculation. Lod_bias must be in the range —-4.0 to +3.99. The result of LOD + /od_bias is clamped 
between the min_lod and max_lod parameters. The min_lod and max_lod parameters define the usable 
region of the texture pyramid and can range from 0.0 to 10.0. The following figure shows how a negative 
LOD_bias can be added to the computed LOD to effectively shrink the filter footprint in texture space. 


Figure 28 - LOD bias 
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filter footprint 
(lod_bias is negative) 
LOD bias will sharpen the texture when the polygon is oblique (the desired effect) but also when it is not. 
To alleviate this problem, the bias_clamp parameter can be used to lessen the effect of lod_bias when the 
polygon is more perpendicular to the view direction. When bias_clamp is enabled, the biased LOD will be 
clamped to the minimum extent of the pixel projected in texture space. 


Multiple square trilinear texture filter ‘footprints’ can be iterated to approximate the shape of the quadrilat- 
eral. This type of filter produces a sharper pixel and is said to be anisotropic (non-uniform shape). The 
maximum number of ‘footprints’ allowed is programmable using the max_aniso parameter. Setting 
max_aniso to GX_ANISO_1 allows only one square footprint and is the standard isotropic mipmap filter. 
Setting max_aniso to GX_ANISO_2 and GX_ANISO_4 allows a maximum of 2 or 4 footprints per pixel, 
respectively. The actual number of footprints used for each pixel is determined by the anisotropy of the 
pixel as computed by the hardware. Trilinear filtering should be enabled (min_filt = GX_LIN_ MIP LIN) 
when using GX_ANISO_ 2 or GX_ANISO_ 4. Also, edge LOD must be enabled in order for anisotropic filter- 
ing to take place. 
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Multiple iterations of the texture filter require multiple internal cycles of the texture hardware. For example, 
if GX_ANISO_4 is enabled, and the polygon being rendered requires the maximum four filter steps, the 
peak fill rate will be divided by a factor of four. Anisotropic filtering does not lower the number of available 


TEV stages (see "9 Texture environment (TEV)" on page 89). 


Figure 29 - Anisotropic filtering 
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The maximum image size is 1K x 1K-texels, so the largest mipmap pyramid is 11 levels of detail. LOD 0 
always refers to the highest resolution LOD, regardless of the image size. In the case of the 1K x 1K-texel 
mipmap, LOD 10 is the coarsest resolution (1 x 1 texel). 


Figure 30 - Mipmap pyramid for the largest texture size 
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Magnification 


When LOD is negative, the LOD is said to be in the magnification region. In other words, the pixel pro- 
jected into texture space covers only a fraction of a texel. This causes the LOD 0 texture to be magnified 
on the display device. 








When the texture is in the magnification region, you can choose between GX_NEAR and GX_ LINEAR filter 
modes using the mag_filt parameter. 
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Within an image, the GX_NEAR filter mode indicates that the closest texel to the pixel’s s- and f-coordinates 
is chosen, as shown in the figure below: 


Figure 31 - GX_NEAR 











Pixel Color P =A 


The GX_LINEAR filter mode indicates that the nearest four texels to the pixel’s (s, f) coordinates should be 
bilinearly interpolated using the (s, ¢) fractional bits, as shown in the figure below: 


Figure 32 - GX_LINEAR 























Cc D 














sO = A + (B-A)*s_fraction 
s1=C+(D-C)*s_ fraction 


Pixel Color P = sO + (s1-s0)*t_fraction 


When LOD is positive, the LOD is said to be in the minification region. In other words, the pixel projected 
into texture space covers more than one texel, and to achieve a 1:1 texel to pixel ratio, these texels must 
be filtered. The texture will appear smaller on the display device. 
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As illustrated in the table below, when the texture is in the minification region, you can choose between 
several filter modes using the min_filt parameter. 


Table 6 - Mipmap minimum filter modes 














Within LODn, LODn+1 Between LODn, LODn+1 
(based on s, t fraction) (based on LOD fraction) 
GX_NEAR_MIP_NEAR Use texel nearest to sample point. Use nearest LOD. 
GX_NEAR_MIP_LIN Use texel nearest to sample point. Interpolate between two 
closest LODs. 
GX_LIN_MIP_NEAR Bilinearly interpolate four texels Use nearest LOD. 
surrounding sample point. 
GX_LIN_MIP_LIN Bilinearly interpolate four texels Interpolate between two 
surrounding sample point. closest LODs 

















Color-index textures cannot use the GX_NEAR MIP LINorGX_LIN MIP_LIN filter modes. All other tex- 
ture types, including compressed texture, can use any min_filt filter mode. 





8.4 Loading texture objects 


As shown in "9 Texture environment (TEV)" on page 89, the default texture pipeline configuration accepts 
up to eight texture coordinates, each associated with a texture map. To associate a texture with 
GX_TEXMAPO, for example, you use the following function: 


Code 41 - GXLoadTexObj 





GXLoadTexObj (GX_TEXMAPO, &myTexObj) ; 


Loading a texture object only loads the state describing the texture into the hardware. The default GX con- 
figuration allocates texture memory as caches and automatically assigns a cache to use with this texture 
when you call GXLoadTexObj. Once you have loaded a texture object, you may render polygons using 
that texture. Up to eight texture objects can be loaded at once for multitexturing. There is no need to syn- 
chronize the texture state with primitives explicitly (as was the case with N64). The GP automatically syn- 
chronizes state changes with pixels in the rendering pipeline. 


Refer to "9 Texture environment (TEV)" on page 89 for more advanced control of texture coordinate order- 
ing. 
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8.5 Loading Texture Lookup Tables (TLUTs) 
Code 42 - Loading TLUTs 


GXInitTlutob}j ( 
GXTlutObj* obj, 
void* lut, 
GXTlutFmt fmt ); 
GXLoadTlut ( 
GXTlutObj* obj, 
u32 tlut_name, 
ul6 start_entry, 
ul6 n_entries ); 


In order to use a color-indexed texture, you must first load a texture lookup table (TLUTs) into TMEM. The 
default configuration of TMEM allows for 20 TLUTs, 16 with 256 entries and 4 with 1kb entries (see the 
GXTluts enumeration). To load a TLUT, you follow steps similar to loading a texture object: 


1. Describe the location and format of the TLUT in main memory using GXInitTlut0Obj. 
2. Load the TLUT into one of the named TLUTs in texture memory using GXLoadTlut. 
3. Use GXInitTex0ObjCI to describe the color-indexed texture. 

4. Set the tlut_name parameter to the TLUT name you have loaded. 


Pointers to TLUTs in main memory must be aligned to 32 bytes. TLUTs must be a multiple of 16 entries, 
with 16 bits for each entry. The total number of entries loaded using GXLoadTlut must also be a multiple 
of 16. 


8.6 How to override the default texture configuration 


So far, we have discussed the texture pipeline only in the context of the default configuration set by 
GXInit. The purpose of this configuration is to abstract some of the complexities of the GX API in order to 
allow the developer to concentrate on the basic features. However, developers may easily override the 
default configuration in order to tailor the system to fit their specific applications more closely. This section 
discusses how a developer can take advantage of the GX API’s additional flexibility to configure the follow- 
ing options: 


¢ Allocation of TMEM. 


* Binding of textures to texture caches and preloaded textures at run time. 
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8.6.1 Texture regions 


The TMEM is a 1MB high-speed DRAM that can be configured to contain texture caches, preloaded tex- 
tures, and TLUTs. The GX API uses a simple TMEM management scheme that is set up by GXInit. This 
scheme assumes that texture caches and TLUT regions are pre-allocated in TMEM. The default configura- 
tion does not allow use of pre-loaded textures. Regions are simply defined by a TMEM pointer and size. At 
runtime, textures are bound to a particular cache region by calling G<LoadTex0Ob j according to informa- 
tion such as the texture ID of the target and the texture format. TLUT data is loaded into TLUT regions by 
calling GXLoadTlut. GXInit configures TMEM as shown below: 


Figure 33 - Default TMEM configuration 
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Logically, the 1MB TMEM is split into two 512KB low and high banks. The default configuration defines 16 
texture caches for the low bank and 8 texture caches for the high bank. When textures are used, each tex- 
ture ID (GX_TEXMAPO, GX_TEXMAP'1, ... , GX_TEXMAP7) is allocated two texture caches from the low 

bank and one from the high bank. 
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TLUTs that are used for color index format are allocated in the remainder of the high bank. There are 
16x256-entry and 4x1024-entry TLUTs defined. TLUTs must always be allocated in the high bank. 


When texture is without mipmapping and does not have a 32-bit format, only one cache region in the low 
bank (part A in Figure 33) is used. Cache regions in both the low bank and the high bank (parts A and B in 
Figure 33) are used only when a 32-bit format is used. 


When 32-bit formatted mipmapped textures that are not color index formatted are used, even LODs are 
cached in one bank while odd LODs are cached in the opposite bank. Thus, one cache region from the low 
bank and one cache region from the high bank (parts A and B in Figure 33) are used. 


When color index textures with mipmapping are used, both odd LODs and even LODs are allocated in the 
low bank (parts A and C in Figure 33). This is because the color index texture region must always be allo- 
cated in the low bank (that is, they must be in the bank opposite from the TLUTs). 


When 32-bit format with mipmapping is used, it is necessary to have twice the continuous cache region for 
the odd LODs and even LODs as compared to when normal texturing is used. In this case, the necessary 

regions (parts A and C in Figure 33) are allocated from the low bank. However, because a sufficient num- 

ber of regions cannot be secured in the high bank, cache regions and adjoining textures with different map 

IDs are shared. No operational problems are caused when one cache region is shared by two or more tex- 
tures. But performance may be affected. 


The GX API allows the application to configure texture regions using GXInitTexCacheRegion and 
GXInitPreLoadRegion. Application developers can also define their own region-binding schemes by 
registering a callback function with GxSet TexRegionCallback or GXSetTlutRegionCallback. 





8.6.2 Cached regions 
Texture regions describe areas of TMEM that can be used as texture caches or preloaded textures. 


Code 43 - GXInitTexCacheRegion 


GXInitTexCacheRegion ( 


GXTexRegion* region, 
GXBool is_32b mipmap, 
u32 tmem_even, 


GXTexCacheSize size even, 
u32 tmem_odd, 
GXTexCacheSize size _odd ); 


For cached images, the s/t coordinates are translated into cache tag memory addresses. If the addressed 
tag indicates the texture is resident in TMEM, the s/t coordinates are translated into TMEM addresses, and 
the texture is accessed. If the texture is not resident, the GP will issue memory requests to copy the texture 
from main memory to TMEM. Once the texture is resident, the s and t coordinates are translated into 
TMEM addresses and the texture is accessed. The unit of texture access is a texture cache line (32B). All 
textures are stored as a multiple of this line size. The GP makes texture requests ahead of time and stores 
the TMEM addresses in a FIFO. Prefetching allows rendering to proceed during a cache miss. 


The GP uses main memory addresses for tags, so caches can be shared among textures (lower perfor- 
mance) without needing to invalidate between texture loads. Mipmaps that are to be trilinearly filtered must 
allocate a cache region in both the low and high banks. The tmem_even parameter defines the location in 
TMEM where even LODs will be cached. The tmem_odd parameter defines the location in TMEM where 
odd LODs will be cached. Usually, the two types of LODs must be placed in opposite banks. For non-mip- 
mapped (planar) textures (except GX_TF RGBA8), only the *_even parameters need to be defined. Planar 
(except color index) texture regions can be allocated in either the low or high bank, but cannot span both 
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banks. Color-indexed texture regions (both planar and mipmapped) must always be allocated in the low 
TMEM bank. The size_even and size_odd parameters describe the size of the cache region for their 
respective sets of LODs. Unused odd parameters should be set to 0 (address) and GX_TEXCACHE_NONE 
(size). 

















Full-color textures (GX_TF_RGBA8) can only be mipmapped in two cycles, with the even LOD accessed on 
the first cycle and the odd LOD accessed on the next cycle. In this case, the tmem_even cache (which 
must be in the low bank) is used to store the AR (alpha and red) components of the texture and the 
tmem_odd cache (which must be in the high bank) is used to store the GB (green and blue) components. 


Note: This explanation is simplified; the actual storage is slightly different. 


Within each bank, the even LODs are cached first, followed by the odd LODs. For this case, the size_even 
and size_odd parameters refer to the size of the cache region for their respective LODs within one bank. 
Thus the actual cache memory usage will be twice the sum of the even and odd sizes. The parameter 
is_32b_mipmap indicates the region will be used in this manner. 


TMEM caches are sized in terms of superlines (512B or 4x4 lines). TMEM caches can be only one of three 
sizes: 32KB, 128KB, and 512KB. Each cache pointer must be aligned to 2KB (2x2 superlines). 


The default cached regions created by GXInit are 8x8 superlines (32KB). 


8.6.3 TLUT regions 
Code 44 - GXInitTlutRegion 


GXInitTlutRegion ( 
GXTlutRegion* region, 
u32 tmem_addr, 
GXTlutSize tlut_sz ); 


TLUTs must be allocated in the high bank of TMEM. Color-indexed texture regions must be allocated in the 
low bank of TMEM. Each 16-bit entry of a TLUT in main memory is replicated into 16 copies during the 
TLUT load. Therefore, the total memory in bytes that needs to be allocated for a TLUT is t/ut_sz * 16 * 2B. 
A 256-entry TLUT requires an 8KB TLUT region. 


The tmem_adar for the TLUT region must be aligned to 512B (16 entries * 16 * 2B). Furthermore, 
tmem_addr must be aligned to the size of TLUT. For example, a 256-entry TLUT should be aligned to an 
8KB TMEM address. 


The TLUT region size can be any power of 2 ranging from 16 entries to 16kb entries. The tmem_addr is 
bitwise OR’d with the texel index to determine the address of the entry. This makes it possible to create 
index sizes that are smaller than the texel type indicates. 


For example, you can create a 1024-entry TLUT and access it using the GX_TF_C14X2 (16-bit) texture 
type. Since a 1024-entry table requires a 10-bit index, the most significant 6 bits of each index in the tex- 
ture should be set to zero. 


8.6.4 Preloaded regions 
Code 45 - GXInitTexPreLoadRegion() 


void GXInitTexPreLoadRegion ( 


GXTexRegion* region, 
u32 tmem_even, 
u32 size_even, 
u32 tmem_odd, 
u32 size odd ); 
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Preloaded textures are loaded explicitly by the application into a TMEM region. The texture is stored in 
TMEM in the same format as main memory. Unlike cached regions, the texture cache tag memory is not 
checked when accessing a preloaded region. Small, frequently-used textures are good candidates for pre- 
loading. 


Note: Preloaded textures are not supported by the default configuration of TMEM (the entire TMEM is 
mapped as caches or TLUTs). 


The size_even and size_odd parameters to GXInitTexPreLoadRegion specify the size of the texture 
regions in bytes. The size_even region is used for even LODs and the size_odd is used for odd LODs 
when mipmapping. When preloading non-mipmapped (non-GxX_TF_RGBA8) images, you only need to 
specify the *_even parameters. 


Figure 34 - Mipmap in TMEM 
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The tmem_even and tmem_odd parameters to GXInitTexPreLoadRegion for a preloaded texture are 
required only to be 32B-aligned. For preloaded mipmaps, tmem_even and tmem_odd must be in opposite 
TMEM banks. The tmem_even value will define the location of all even LODs and tmem_odd will define 
the location of all odd LODs. 


Note: Even LODs will use more memory in their bank of TMEM than the odd LODs in the opposite bank. 
Figure 35 - Planar texture in TMEM 
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Figure 36 - 32-bit planar texture in TMEM 
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For planar non-color index textures (except 32-bit), only {mem_even is used, and may be located in either 
the high or low bank of TMEM. For color index textures, tmem_even must be in the low bank of TMEM. 
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For 32-bit planar textures (GX_TF_RGBA8), the tmem_even is the address of the AR tiles and must be in 
the low bank of TMEM. The GB tiles are located at tmem_odd and must be in the high bank of TMEM. 


Figure 37 - Color index mipmap in TMEM 


Low Half High Half 
0 16K 


tmem_even 


Level0 





TLUT 


Image: 8-Level Mipmap Pyram id 


‘ Level 2 
notdrawn to scale 





Level 4 











Level6 








tmem_odd 


Level 1 





Level 3 





Level5 
Level7 








16K -1 32K-1 
<— 32:8» <= 328. =F 








RVL-06-0037-001-A © 2006 Nintendo 
Released: February 27, 2006 CONFIDENTIAL 


How to override the default texture configuration 81 


For color index preloaded textures, tmem_even and tmem_odd must be in the low bank of TMEM. The 
hardware will load all even LODs at the tmem_even address and all odd LODs at the {mem_odd address. 


Figure 38 - 32-bit mipmap in TMEM 
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GX TF _RGBAS8 textures are stored as interleaved AR and GB tiles (32B/tile). When preloading a 
GX TF RGBA8 texture, the AR and GB are written to opposite TMEM banks. The tmem_even (AR) tiles 
should be in the low TMEM bank and the tmem_odd (GB) tiles should be in the high TMEM bank. 


Note: 32-bit textures use the same amount of memory in the low and high banks. 
To actually load the texture into TMEM call: 
Code 46 - GXPreLoadEntireTexture() 


GxXPreLoadEntireTexture ( 
GXTexObj* obj, 
GXTexRegion* region ); 
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To associate a hardware texture map ID (GXTexMapID) with the preloaded region, use: 
Code 47 - GXLoadTexObjPreLoaded() 


void GXLoadTex0Obj PreLoaded ( 


GXTexObj* obj, 
GXTexRegion* region, 
GXTexMapID id ); 


8.6.5 Texture cache allocation 


You can override the default TMEM allocation by replacing the callback function that binds texture objects 
to texture regions: 


Code 48 - GXSetTexRegionCallback 


typedef GXTexRegion * (*GXTexRegionCallback) ( GXTexObj *tex_obj ); 


GXTexRegionCallback GXSetTexRegionCallback( GXTexRegionCallback f ); 


This function is called by GkXLoadTexObj and is expected to return a pointer to a GXTexRegion to use for 
this texture. These texture regions can be statically allocated or dynamically allocated according to the 
needs of the application. The TMEM allocation scheme must also consider color index TLUT and pre- 
loaded texture memory requirements. GXSet TexRegionCallback returns the callback that was set prior 
to its invocation. 


The programmer can use the GXInitTex0ObjUserData function to set a pointer to user data in the tex- 
ture object. This data may be needed in order to implement a better TMEM allocation strategy. The data 
can be retrieved using GXGet TexObjUserData. 


8.6.6 TLUT allocation 


If color index textures are to be used, the TMEM allocation scheme must consider TLUT region allocation 
in addition to texture region allocation. As described in "8.6.3 TLUT regions" on page 76, there are more 
restrictions on the placement of color index textures and TLUTs in TMEM than on non-color index textures. 
To override the default TLUT allocation scheme the application must replace the callback function that 
associates a TLUT name with a TLUT region using: 


Code 49 - GXSetTlutRegionCallback 
typedef GxXTlutRegion *(*GXTlutRegionCallback) (u32 name) ); 


GXTlutRegionCallback GxXSetTlutRegionCallback( GXTlutRegionCallback f ); 


The callback function is called by GkLoadTex0bj to associate the TLUT name (GXInitTex0bjCI) with 
a TLUT region. GXLoadT1ut also calls the callback function. GxSet TlutRegionCallback returns the 
callback that was set prior to its invocation. 
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8.7 Invalidating texture cache 


The texture hardware maintains a Cache Tag Memory that maps a texture cache line’s main memory 
address to its TMEM address. Because of this, textures can share a cached region without address colli- 
sions. However, the following situations will require invalidating the texture cache: 


¢ The texture is moved to a new main memory location. 
« Anew texture is copied into the memory occupied by a previously used texture. 
« The application modifies some texels of a texture in main memory. 


Invalidating the texture cache requires resetting the state of certain tag bits in the cache tag memory. This 
will force the reloading of the affected texture. We provide functions for invalidating either a texture region 
or the entire TMEM: 


Code 50 - Invalidating texture memory 


GXInvalidateTexRegion( GXTexRegion* region ); 
GXInvalidateTexAll( void ); 


It is not necessary to invalidate TLUT regions and preloaded regions because they are explicitly loaded 
into TMEM. If the data in a TLUT or preloaded texture is changed, the application must reload it for the 
change to take effect. 


8.8 Changing the usage of TMEM regions 


Sometimes an application will change the use of a particular region of TMEM from preloaded to cached or 
from TLUT to cached. In these cases—and only these—the application should call GXTexModeSync to 
ensure all texels currently in the pipeline are flushed before the change in usage goes into effect. The call 
should be made prior to drawing any primitives that will use the TMEM region in the new mode. When 
changing a TMEM region from cached to preloaded (or TLUT), the command to load the TMEM region will 
synchronize the pipeline automatically. 


8.9 Creating textures by copying the embedded frame buffer 


Textures can be created by copying the Embedded Frame Buffer (EFB) to main memory using the 
GXCopyTex function. This is useful when creating dynamic shadow maps, environment maps, motion blur 
effects, etc. 


All non-color index texture types except compressed textures (GX_TF_CMPR) can be created during the 
copy. The texture copy operation will create the correct tiling and formatting of the texture so it can be read 
directly by the hardware. Optionally, you can apply a box filter to the image in the EFB in order to create a 
lower level of detail (LOD) texture. The box filter can be used to create mipmaps from the EFB data. 
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HW2 adds new options for how information is copied from the EFB into a texture. See "15 GX updates for 
HW2" on page 155 for details. 


Table 7 - Texture copy formats and conversion notes 


Format Conversion 














GX_TF_l4 RGB->(Y)UV, AA and non-AA pixel formats. 

GX_TF_I8 RGB->(Y)UV, AA and non-AA pixel formats. 

GX_TF_A8 A (6 bits) ->A (8-bits, 2 MSBs replicated in LSBs), only with pixel format 
GX _PF_RGBA6 224. 

GX_TF_IA4 RGBA->(Y)UV(A), if pixel format is not GX PF RGBA6 224, then A = Oxf. 

GX_TF_IA8 RGBA->(Y)UV(A), if pixel format is not Gx_PF_RGBA6 224, then A = Oxff. 





GX_TF_RGB565 | RGB->RGB, bits truncated for non-AA pixel formats. 





GX_TF_RGB5A3 | RGBA->RGBA, if pixel format is not GX_PF_RGBA6 224, then MSB=1, i.e. 
RSGSB5. 


GX_TF_RGBA8& | RGBA->RGBA, if pixel format is not GX_PF_RGBA6 224, then A = Oxff. 





GX_TF_Z24X8 Z (24 bits) -> Z (32 bits), only when pixel format is non-antialiased, 
GX_PF_RGB8 224 orGX_PF_RGBA6 224. 














The EFB can be used in one of two basic modes: antialiased (pixel format is GX_PF_RGB565 216) and 
non-antialiased (pixel format is Gk_PF_RGB8 224 orGX PF RGBA6 224). You can copy color textures in 
either mode, but Z textures can only be copied from non-antialiased frame buffers. See "12 Video output" 
on page 125 for more details on the EFB modes. 


Figure 39 - Texture copy data path 
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The RGB-to-YUV conversion that takes place in the texture copy pipeline is the same as that which is used 
for the display copy pipeline for video output (see "12 Video output" on page 125). As a result, the intensity 
is scaled: 16 <= Y <= 235. 
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The following functions control the copying of textures from the EFB to main memory: 


Code 51 - Texture copy functions 


void GxXSetTexCopySrc ( 


ul6é left, 
ul6 top, 
ul6é wd, 
ul6é Pie Ve 


void GxXSetTexCopyDst ( 





ul6é wd, 

ul6é ht, 
GxTexFmt fmt, 
GXBool mipmap ); 


void GxXCopyTex ( 
void* dest, 
GXBool clear ); 


GXSetTexCopySrc specifies the source rectangle to copy from the EFB. All parameters to 
GXSetTexCopySrc must be multiples of two pixels. GxSet TexCopyDst specifies the destination rectan- 
gle in main memory. Normally, the source and destination rectangles would have the same size. However, 
when copying small textures that will be composited into a larger texture, the source and destination rect- 
angles may differ. 


Figure 40 - Copying small textures into a larger texture in main memory 
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The fmt argument of GXSet TexCopyDst can specify a few subtle types of texture copy operations. The 
format GX_TF_ A8 is used specifically to copy the aloha channel from the EFB into a Gx_TF_ 18 formatted 
texture. 


Note: GxX_TF_1I8 will copy the scaled /uminance of the EFB into a Gx_TF_18 texture. When reading a 
texture GX_TF_A8 and GX_TF_18 are equivalent. 


When color textures are converted from an GX_PF_RGB8_ 224 pixel format to a lower-resolution color for- 
mat, such as GX_TF_ RGB565, the least significant bits (LSBs), of the 8-bit colors are truncated. When 
color textures are converted from a lower-resolution pixel format, say GX_PF_RGB565_ 216, toa higher 
resolution texture format, say GX_TF_RGBA8, the most significant bits (MSBs) of each pixel are replicated 
in the LSBs of each texel. This conversion process distributes the estimation error evenly and allows each 
texel to represent the minimum or maximum value. 


In general, you should only copy textures containing alpha from an EFB with format Gk _PF RGBA6 224. 
When copying a texture containing alpha from an EFB without alpha, alpha will be set to its maximum 
value. 


The GX_TF_Z24x8 format can be used to copy the 24-bit Z buffer to a 32-bit texture (equivalent format to 
GX_TF_RGBA8, see Appendix D). The next section describes how this “Z texture” can be used. It is not 
legal to copy out 8-bit or 16-bit Z textures, and you may not copy Z from an antialiased EFB (see "12 Video 
output" on page 125 for more details on EFB formats). 


Note: HW2 relaxes some of these restrictions. See "15 GX updates for HW2" on page 155 for details. 


The function GXCopyTex initiates the copy operation. It can conditionally clear the EFB at the same time if 
clear is true. The update enable flags for each buffer to be cleared must also be enabled (see 
GXSetColorUpdate, GXSetAlphaUpdate, and GXSetZMode). This allows individual buffers to be con- 
ditionally cleared. The copy filter in effect at the time the texture is copied (see GXSetCopyFilter) will 
be applied to the EFB data during the conversion process. 


To copy a texture, the application must first allocate a buffer in main memory that is the size of the texture 
to be copied. This size can be determined using GkGet TexBufferSize. This function takes texture pad- 
ding and texture type into account in its calculations. 





Before a copied texture can be applied to a textured primitive, you must ensure that the copy operation has 
finished. The command GXPixModeSync may be called after GkCopyTex in order to guarantee this. 
GXPixModeSync flushes the pixel pipeline, ensuring that the copy is finished before a new primitive is 
started. 


Sometimes it is useful to determine the screen rectangle that encloses a group of rendered geometry. The 
functions GXClearBoundingBox and GXReadBoundingBox can determine the bounding box of geome- 
try rendered in the EFB. Call GxClearBoundingBox first to reset the bounding box, then render the 
geometry. The GP will update the minimum and maximum screen-space (x, y) continually for each pixel 
drawn. After rendering the geometry, the bounding box values can be read back using 
GXReadBoundingBox. You can use these values to compute the arguments to GXSet TexCopySrc. 


8.10 Z textures 


The GP supports combining a color texture and a Z texture into the Embedded Frame Buffer (EFB). This 

feature can be used to facilitate image-based rendering, in which a frame buffer is a composite of smaller 
color and depth images, like sprites with depth. Each of the sprites can be computed at independent frame 
rates, but the final frame buffer is composited at the target frame rate. 


You can create Z textures by copying a region of the EFB using GXCopyTex with the pixel format (see 
GXSetTexCopyDst) set to GX_TF_224x8. Additionally, 8- and 16-bit Z textures can be created offline or 
by using the CPU. The GxCopyTex function cannot create 8- or 16-bit Z textures (except on HW2; see "15_ 
GX updates for HW2" on page 155). Moreover, Z textures cannot be copied from an antialiased EFB. 
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The texture input to the /ast active TEV stage is used as the Z texture, so the application must be careful to 
arrange the TEV equation so the Z texture is output by this stage. When Z texturing is enabled, color is 
output from the /ast active TEV stage, but the texture input to the last stage is not available (because it is 
used for the Z texture). 


Figure 41 - Z texture block diagram 
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The Z texture can either replace or offset the rasterized Z of the polygon. Normally, each pixel’s Z in a quad 
is computed by adding Z slopes to a reference Z computed at the center of the quad. When Z texture is 
enabled, the Z texels will offset the reference Z (i.e., the Z slopes will not be added, and thus the computed 
Z accuracy will be per-quad, not per-pixel). In addition, a constant bias can be added to the result. Finally, 
if Z buffering is enabled, the resulting Z is compared with the EFB current Z. The pipeline must be config- 
ured to Z buffer after texture lookup when using Z textures. The Z-texture adders do not clamp, so the pro- 
grammer must make sure that there is no overflow. 


Code 52 - GXSetZTexture 


void GxXSetZTexture ( 


GXZTexOps op, 
GXTexFmt fmt, 
u32 bias)? 


The fmt argument above can be one of the following: 
* GX _TF 28 
* GX_TF 216 


* GX TF Z24x8 
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8.11 Texture performance 
This section documents peak texture performance assuming a 200 MHz graphics processor clock speed. 


Table 8 - Texture performance 


Texture 
Count 


Performance 





1 800 million pixels/second 





400 million pixels/second 





267 million pixels/second 





200 million pixels/second 





160 million pixels/second 





133 million pixels/second 


114 million pixels/second 





CO;rnN|O;] oO] &}] Ww] Nh 


100 million pixels/second 














Note: 32-bit textures filter bilinearly at 800Mpixels/second, and trilinearly at 400Mpixels/second. Expect 
less than 15% degradation for properly sampled cached textures. 
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9 Texture environment (TEV) 


9.1 Description 


The Texture Environment (TEV) combines per-vertex lighting, textures, and constant colors to form the 
pixel color (before fogging and blending). The color and alpha components have independent TEV units 
with independent controls. There is only one set of TEV color/alpha-combiners implemented in hardware. 
To implement multi-texture, the TEV hardware is reused over multiple cycles, called TEV stages. Each 
TEV stage has independent controls, and a maximum of 16 TEV stages are supported. A consecutive 
number of TEV stages may be enabled in order to perform multi-texturing. The resulting pixel color is the 
color output from the last active TEV stage. 


A set of four input/output color registers are provided to store temporary results, pass results from one 
stage to the next, or to supply user-defined constant colors. These color registers are shared among all 
TEV stages. The last stage must send its output to the GX_TEVPREV register. (HW2 provides additional 
registers; refer to "15 GX updates for HW2" on page 155.) 














The alpha produced by the last TEV stage is input to an alpha-compare equation. The result of the alpha 
compare can be used to conditionally mask color (and possibly Z) writes to the frame buffer. 


Fog, if enabled, is applied to the pixel values output from the last active TEV stage. Blending operations 
occur after fogging. (Fog and blending are described in later chapters.) 


Figure 42 - TEV block diagram 
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9.2 Default texture pipeline configuration 


By default, the texture pipeline is configured to look like the diagram below. Each texture coordinate 
enabled for a vertex (using GXSet Vt xDesc) is sent down the pipeline in the input order. Each coordinate 
accesses a texture, and the resulting texel is fed to the corresponding TEV stage. Since only eight unique 
textures are available, only eight TEV stages are configured this way; the remaining eight stages are initial- 
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ized with null texture and color inputs. The number of active TEV stages and the number of texture coordi- 
nates that are generated must be set by the application (see GxSetNumTevStages and 
GXSetNumTexGens). By default, only stage 0 produces an output. The default operation for stage 0 is 

GX REPLACE, meaning only the texture color is output (see "9.4 GXSetTevOp" on page 90). 

















Figure 43 - Default texture pipeline 
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The texture pipeline is very configurable. The user can override this simple default configuration as 
described in the following sections. 


9.3. Number of active TEV stages 
To program the number of active TEV stages, use: 


Code 53 - GXSetTevStages 


GxXSetNumTevStages( u8 stages ); 


There must always be at least one TEV stage enabled. TEV stages are enabled consecutively. A maxi- 
mum of 16 TEV stages may be enabled. 


9.4 GXSetTevOp 


We provide the function GXSetTevOp to simplify initial programming demos. It determines the color pro- 
cessing that occurs at the specified TEV stage. This function calls GkxSetTevColorIn, 
GXSetTevAlphaIn, GXSetTevColorOp, and GXSetTevAlphaOp (described later). 


Code 54 - GXSetTevOp 


GXSetTevOp( GXTevStageID stage, GXTevMode mode ); 
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The following table lists the Gx TevMode types and the implied operation. 


Table 9 - GXTevMode types 











Name (Ofo} fo) Od a-Tal al: me] °) PN Volar mOdat-Valal=) me) °) 
GX_MODULATE Cy =C,C, Ay = Ar At 
GX_DECAL Cy=(1-A) C+ AC, Ay=A, 
GX_BLEND Cy =(1-C,) C, + C, Ay =AtAr 
GX_REPLACE Cy = Cy Ay =At 











* For stage 0, subscript ris the rasterized color (from a lighting channel) for this stage. 
¢ For other stages, subscript ris the previous stage output color. 
¢ Subscript t is the texture value for this stage. 


¢ Subscript v is the output color of this stage. 


9.5 Color/alpha combine operations 


As mentioned, GXSetTevOp is a simplifying function that sets the inputs and operation for a given TEV 
stage. In order to use the full power of the TEV, you must set all of these parameters independently. First 
we’ll describe the operation that takes place within a TEV stage, followed by how to set the various inputs 
for the operation. 
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The figure below shows the operation that can be programmed in a given TEV stage. Remember that there 
are two operations happening in a given cycle: one for the color components, and one for the alpha com- 
ponent. 


Figure 44 - TEV operations 
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Op = optional negate Scale = multiply by 1/2, 1, 2, or 4 
Code 55 - GXSetTevColorOp, GXSetTevAlphaOp 


GXSetTevColorOp ( 
GxTevStageID stage, 
GxXTevOp op, 

GXTevBias bias, 
GXTevScale scale, 

GXBool clamp enable, 
GXTevRegID out_reg ); 
GxXSetTevAlphaoOp ( 
GxTevStageID stage, 
GxXTevOp op, 

GXTevBias bias, 
GxXTevScale scale, 

GXBool clamp enable, 
GXTevRegID out_reg ); 





Each TEV stage has its own set of operation controls. The color and alpha operations are set indepen- 
dently using GXSet TevColorOp and GXSetTevAlphaOp. However, the clamp mode, set using 
GXSetTevClampMode, applies to both the color and alpha combiners. (Refer to the next section for 
details about the clamp mode.) 


The TEV operation begins by performing a linear interpolation between A and B inputs, using C as the 
interpolation factor. The inputs A, B, and C are always unsigned 8-bit values having a range [0 <= A,B,C <= 
255]. The result of the interpolation can be optionally negated using op. 


The input D and a bias value (0, +0.5, or —0.5) are then added to the result. The D input is a signed 10-bit 
number having a range [-1024 <= D <= 1023]. The result of each TEV stage can be a signed 10-bit value, 
so this input is provided for accumulating out-of-range values. 


A constant scale (1, 2, 4, or 0.5) is then applied. The result is optionally clamped and written to an output 
register. (See the next section for more details on clamping.) 


For color operations, the same sequence is applied in parallel to the RGB color components. Alpha opera- 
tions are independent of the color operations. 
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The output registers are available as inputs for the next TEV stage. For example, GX_TEVREGO corre- 
sponds to GX_CC_CO for the color TEV input, and GX_CA_AO corresponds to the alpha TEV input. The out- 
put registers can store a signed 10-bit number. The output register GX_ TEVPREV is used by convention to 
pass the results of one TEV stage to the next. GX_TEVPREV must be the output register of the last active 
TEV stage. 





























Table 10 - Correspondence between TEV input and output register names 














TEV register TEV register TEV register 

ray olen mexe) (oy mm at-lani=) ray elUiar- Vi e)areMarclinis) CoOL olULMmar-lini=) 
GX_CC_CO GX_CA_AO GX_TEVREGO 
GX_CC_C1 GX_CA_A1 GX_TEVREG1 
GX_CC_C2 GX_CA_A2 GX_TEVREG2 
GX_CC_CPREV GX_CA_APREV GX_TEVPREV 

















HW2 adds new features to the TEV. Details may be found in "15 GX updates for HW2" on page 155. 


9.5.1 Clamp modes 


The clamp_enable parameter in GXSet TevAlphaOp and GXSetTevColorOp enables or disables clamp- 
ing. The actual clamping mode is controlled by: 


Code 56 - GXSetTevClampMode 


GXSetTevClampMode ( 
GxXTevStageID stage, 
GXTevClampMode clamp mode ); 


The clamp mode is shared by both alpha and color channels in the same TEV stage. 
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The effect of clamp_enable and clamp_mode is shown in the table below: 


Table 11 - Clamp enable and clamp mode 


























fod F-Van) oy at ar-ley(s 
‘(Tate (=) eX=valel=vals od F-Taay omni" Core (=m Clare La-ve ire 1) dare] ae 
| DY=X-Yeq ale) ufo) a} 
Ellelarcie-lile mele) lave mexe) (olmexoyal ine) )) 
control) 
GX_FALSE GX_TC_LINEAR S =(R> 1023) ? 1023 : ((R < -1024) ? -1024) 
:R) 

GX_TRUE GX_TC_LINEAR S = (R> 255) ? 255 : ((R< 0) ?0:R) 
GX_FALSE GX_TC_GE S =(R>=0) ? 255: 0 
GX_TRUE GX_TC_GE S =(R>=0) ?0: 255 
GX_FALSE GX_TC_EQ S = (R == 0) ? 255: 0 
GX_TRUE GX_TC_EQ S =(R == 0) ?0: 255 
GX_FALSE GX_TC_LE S =(R <= 0) ? 255: 0 
GX_TRUE GX_TC_LE S =(R<=0)?0: 255 














Note: On HW2, the non-linear clamp modes have been removed (only the linear clamp modes are imple- 
mented), and thus GxSet TevClampMode is not available. HW2 includes new comparison opera- 
tions to replace the lost clamp modes. Refer to "15 GX updates for HW2" on page 155 for more 
details. 
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Figure 45 - TEV stage color inputs 
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GXSetTeve olorin 


Code 57 - GXSetTevColorin 


GXSetTevColorIn ( 
GxTevStageID 


GXTevColorArg 
GXTevColorArg 
GXTevColorArg 
GxXTevColorArg d ); 


The TEV allows for many color input sources including constant (register) colors and alphas, texture color/ 
alpha, rasterized color/alpha (the result of per-vertex lighting), and a few useful constants. Notice that the 
input controls are independent for each TEV stage. 


The GX_CC_TEXRRR, GX_CC_TI 











EXGGG, GX_CC_TEXBBB inputs can be used to copy one component of the 


texture color to the other texture color channels. This feature is useful for dot product, intensity calculation, 


and color space conversion. 
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The register colors can be programmed directly as constant colors, or used to store the results of TEV 
operations. For example, GX_TEVREGO corresponds to GX_CC_CO for the color TEV input. Unlike the rest 
of the TEV controls, the TEV registers are shared among all the TEV stages. The application must be care- 
ful to partition the use of registers as input colors and output colors when designing a color-combining 
equation. The following code sets register 1 to a constant 8-bit (per component) color and register 2 toa 
constant 10-bit (per component) color: 














Code 58 - Setting constant color 


GXColor cyan = { 0x00, Oxff, Oxff, Oxff }; 
GxXSetTevColor (GX_TEVREG1, cyan); 


GXColorS10 coffset = { -128, -128, -128, 0 }; 
GXSetTevColorS10 (GX _TEVREG2, coffset); 











The default texture pipeline uses the GxX_TEVPREV register to pass the output of one TEV stage to the 
input of the next TEV stage. This is only a convention of the default configuration assumed by 
GXSetTevOp. You can use GX_ TEVPREV as a general-purpose register when programming your own TEV 
equations. However, at least one register is required to pass results from one stage to the next when multi- 
texturing. Note the GX_TEVPREV must be the output register for the last active TEV stage. This register is 
wired directly to the fogging and blending functions. 





























Note also that the inputs A, B, and C are unsigned 8-bit values, but the register inputs Gx_CC_C0-2 and 
GX_CC_CPREV can be signed 10-bit values [-1024...+1023]. When one of these registers is selected for 
the A, B, or C inputs, the /east significant 8 bits of the register are used. 





The rasterized color and alpha inputs are the result of per-vertex lighting on one of the lighting channels, 
either GX_COLOROAO or GX_COLOR1AL1. It is not possible to use the color from one lighting channel with 
the alpha from the other lighting channel (for example GX_COLORO with GX_ALPHA1) into the same TEV 
stage. In the default texture pipeline, GX_COLOROAO is directed to the first eight TEV stages. To change the 
default, you must call GxSet TevOrder, described later in this chapter. 
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9.7. Alpha inputs 
Figure 46 - TEV stage alpha inputs 
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GXSetTevAlphaln 
Each TEV stage combines alpha channels independently of color channels. There a fewer alpha inputs 
than color inputs, and there are no color channels available in the alpha combiner. 


Code 59 - GXSetTevAlphaln 








A 
mee 

C 

D 








Select One 














void GxXSetTevAlphalIn ( 
GXTevStageID stage, 
GXTevAlphaArg a, 
GXTevAlphaArg b, 
GXTevAlphaArg cc, 
GXTevAlphaArg d ); 


Note: The inputs A, B, and C are unsigned 8-bit values, but the register inputs Gx_CA A0-2 and 
GX_CA_APREV can be signed 10-bit values [-1024...+1023]. When one of these registers is 
selected by the A, B, or C inputs, the /east significant 8 bits of the register are used. 





9.8 Example equations 


Equation 22 - Pass texture color 


Cc, =C, 
Code 60 - Pass texture color 
GXSetTevColorin ( GxXSetTevColoroOp ( 
GX_TEVSTAGEO, GX_TEVSTAGEO, 
GX_CC_ZERO, // a GX_TEV_ADD, // op 
GX_CC_ZERO, //b GX_TB_ZERO, // bias 
GX_CC_ZERO, a ere, GX_CS_ SCALE 1, // scale 
GX_CC_TEXC ); // a GX_ENABLE, // clamp 0-255 
GX_TEVPREV) ; // output reg 
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Equation 23 - Modulate 


Cy Pe CC, 

Code 61 - Modulate 

GXSetTevColoriIn ( 
GX_TEVSTAGEO, 
GX_CC_ZERO, // a 
GX_CC_TEXC, // b 
GX_CC_RASC, // c 
GX_CC_ZERO ); // d 


GXSetTevColorOp ( 
GX_TEVSTAGEO, 
GX_TEV_ADD, 
GX_TB_ZERO, 
GX_CS_SCALE 1, 
GX_ENABLE, 
GX_TEVPREV) ; 


Equation 24 - Modulate 2X 


C aPC. 

Code 62 - Modulate 2X 

GXSetTevColoriIn ( 
GX_TEVSTAGEO, 
GX_CC_ZERO, // a 
GX_CC_TEXC, // b 
GX_CC_RASC, khoe 
GX_CC_ZERO ); // d 


Equation 25 - Add 


CLC, +, 

Code 63 - Add 

GXSetTevColoriIn ( 
GX_TEVSTAGEO, 
GX_CC_TEXC, // a 
GX_CC_ZERO, // b 
GX_CC_ZERO, MNS 
GX_CC_RASC ); //d 


Equation 26 - Subtract 


c,=c,-¢, 

Code 64 - Subtract 

GXSetTevColorIn ( 
GX_TEVSTAGEO, 
GX_CC_RASC, // a 
GX_CC_ZERO, // b 
GX_CC_ZERO, TEC. 
GX_CC_TEXC ); // d 
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GXSetTevColorOp ( 
GX_TEVSTAGEO, 
GX_TEV_ADD, 
GX_TB_ZERO, 
GX_CS_SCALE 2, 
GX_ENABLE, 
GX_TEVPREV) ; 


GXSetTevColorOp ( 
GX_TEVSTAGEO, 
GX_TEV_ADD, 
GX_TB_ZERO, 
GX_CS_SCALE 1, 
GX_ENABLE, 
GX_TEVPREV) ; 


GXSetTevColorOp ( 
GX_TEVSTAGEO, 
GX_TEV_SUB, 
GX_TB_ZERO, 
GX_CS_SCALE 1, 
GX_ENABLE, 
GX_TEVPREV) ; 


// 
// 
// 
// 
// 


// 
// 
// 
// 
// 


// 
// 
// 
// 
// 


// 
// 
// 
// 
// 
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op 

bias 

scale 

clamp 0-255 
output reg 


op 

bias 

scale 

clamp 0-255 
output reg 


op 

bias 

scale 

clamp: 0-255 
output reg 


op 

bias 

scale 

clamp 0-255 
output reg 
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Equation 27 - Blend 


Cc, =0,(1.0- 4,)+ cA, 
Code 65 - Blend 


GXSetTevColorin ( GXSetTevColoroOp ( 
GX_TEVSTAGEO, GX_TEVSTAGEO, 
GX_CC_TEXC, // a GX_TEV_ADD, // op 
GX_CC_RASC, // b’ GX_TB_ZERO, // bias 
GX_CC_TEXA, //c GX_CS_ SCALE 1, // scale 
GX_CC_ZERO ); // a GX_ENABLE, // clamp 0-255 
GX_TEVPREV) ; // output reg 


9.9 Alpha compare function 
You can apply the alpha compare operation to the alpha output from the last active TEV stage. 
Code 66 - GXSetAlphaCompare 


GXSetAlphaCompare ( 
GXCompare compO, 
us refo, 


GXAlphaOp Op, 
GXCompare compl, 
u8 refl ); 


The alpha compare operation actually consists of two separate compares, the results of which can be com- 
bined using several logical operations. 


Equation 28 - Alpha compare 


Asie compl) Ago) op vets comp | A) 
For example, the following compare is possible: 


Equation 29 - Sample alpha compare 


tae = Aveo} AND (A s Aven) 


source 


The result of the alpha compare is a Boolean condition, true or false. The result of the alpha compare is 
used to conditionally write the pixel color (and possibly Z) to the frame buffer. 


The following compare operations are possible: never, less, less or equal, equal, not equal, greater, 
greater or equal, always. The following combine operations are possible: and, or, exclusive-or, exclusive- 
nor. 


The effect of the alpha compare on the Z buffer depends on whether Z buffering occurs before or after tex- 
ture lookup (see GXSetZCompLoc). If Z buffering occurs before texture lookup, then the Z write condition 
is determined only by the Z compare. If Z buffering occurs after texture lookup, then the alpha compare 
result and the Z compare result are logically ANDed to determine whether the color and Z are written to the 
frame buffer. In general, if alpha compare is enabled, Z buffering should occur after texture lookup. 
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9.10 Ztextures 


Z textures are always looked up in the last TEV stage. The texture input of the last active TEV stage is con- 
nected directly to the Z buffer logic; therefore, it is not possible to apply TEV operations to the Z texture. 
The color is still output from the last TEV stage when Z textures are enabled, but the texture input of the 
last stage is occupied by the Z texture so it cannot be used as a color source. However, all other color 
inputs and all TEV operations of the last stage can be used. The alpha side of the TEV stage is not 
affected by Z textures. 


Refer to "8 Texture mapping" on page 57 for information on using Z textures. Refer to the Advanced Ren- 
dering section for information on applications of Z textures. 

9.11 Texture pipeline configuration 

Each TEV stage requires: 

* A texture map (GXTexMapID). 

« A texture coordinate (GXTexCoordID) to use for the texture map. 

* Accolor channel (GX_COLOROAO or GX_COLOR1A1) to rasterize. 

* Modes and controls for the TEV stage itself. 

The first three items are supplied using the following function: 

Code 67 - GXSetTevOrder 


GXSetTevOrder ( 
GXTevStageID stage, 
GXTexCoordID coord, 
GXTexMapID map, 
GXChannelID color ); 


Let’s take each parameter in turn. The stage parameter is the TEV stage that you are configuring. The 
coord parameter is the name of the texture coordinate used to look up the texture. The coord is generated 
according to the function GxSet TexCoordGen. The map parameter is the name of the texture to use. If no 
coord or map is to be used in this stage, they should be set to Gk_TEXCOORD_NULL and 

GX _TEXMAP NULL, respectively. This map ID is associated with a texture using the G<LoadTex0bj func- 
tion. The color parameter is used to name the output color channel of the vertex lighting hardware. The GP 
can only rasterize one color channel per clock, so you must choose whether to rasterize GX_COLOROAO or 
GX_COLOR1A1 for this TEV stage. 








You can think of the G<Set TevOrder function as a switchboard with which you can connect all the inter- 
ested parties together. Similarly, the function GXSet TexCoordGen can be seen as a switchboard that con- 
nects and transforms input vertex data to texture coordinates. 


Note: While only eight texmaps and eight texcoords may be specified, up to 16 different texture lookups 
can be performed. However, since texture coordinates are scaled based on the size of the associ- 
ated texture map, you can only re-use or “broadcast” a texture coordinate to texture maps that 
have the same size. Each TEV stage will perform its own texture lookup based upon all of its set- 
tings. 
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The following diagram shows the relationships between different parts of the texture pipeline and the func- 
tions that control them and their associations. 
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Figure 47 - Texture pipeline control 
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GXInit sets the default texture coordinate generation capability. This function configures texture coordi- 
nate generation to copy each input texture coordinate directly to an output texture coordinate using an 


identity matrix. G<Init sets the number of color channels to 0 (GxSetNumChans). 


Once you understand the default configuration, you can choose to override the texture coordinate genera- 


tion using GXSet TexCoordGen, or the TEV wiring using GXSet TevOrder, or both. 


Note: GxXSetTevOrder determines which texture and rasterized color inputs are available to each TEV 
stage, while GxSet TevColoriIn and GXSetTevAlphatIn determine how the various inputs plug 
into the TEV operation for each stage. Also, GXSet TevColorOp/GXSetTevAlphaOp determine 
how the stages connect together. 
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10 Indirect texture mapping 


The Graphics Processor has a powerful indirect texture feature. It allows the colors looked up from one 
(indirect) texture lookup to be transformed into offsets that are then added to the texture coordinates for 
another (regular) texture lookup. Various operations are possible when combining the output of the indirect 
stage with the coordinates for the regular stage. 


Figure 48 - Indirect texture operation 
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Indirect textures have several possible applications: 
¢ Texture warping. 

¢ Texture tile maps. 

* Pseudo-3D textures. 

¢« Environment-mapped bump mapping. 


Using indirect textures for texture warping effects is the simplest application of the indirect feature. In this 
case, the indirect texture is used to stretch or otherwise distort the surface texture. You can achieve a 
dynamic distortion effect by swapping indirect maps (or by modifying the indirect map or coordinates). You 
may apply this effect to a given surface within a scene, or you can take this one step further and apply the 
effect to the entire scene. In the latter case, the scene is first rendered normally and then copied to a tex- 
ture map. You then draw a big rectangle that can be mapped to the screen using an indirect texture. Tex- 
ture warping allows for shimmering effects, special lens effects, and various psychedelic effects. 


The indirect feature also lets you draw texture tile maps. In this case, one texture map holds the base defi- 
nition for a variety of tiles. An indirect texture map is then used to place specific tiles in specific locations 
over a 2D surface. Normally, this effect is accomplished without indirect textures by drawing a polygon for 
each desired tile. With indirect textures, only one polygon need be drawn. 


Figure 49 - Tiled texture mapping 
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Tile mapping can be extended to become a pseudo-3D texture effect. Consider all the tiles to be part of a 
stack. Rather than drawing just a single tile layer from the stack, you can select two adjacent layers and 
blend between them. You can use this technique to cover a large surface with non-repeating patterns that 
blend together smoothly. You might imagine covering a beach with such a texture, where the layers vary in 
appearance from fine sand to small pebbles to larger rocks. 


Figure 50 - Pseudo-3D textures 
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Layer Select Results 





49 =selectlayers4ands 1 4.9= 0.1 * (layer 4 definition ) + 
use blend factor 0.9 T1 0.9 * ( layer 5 definition ) 

Environment-mapped bump mapping (EMBM) is another use of indirect textures. Regular emboss bump 
mapping considers only the interaction between the bump map and a single diffuse light source. With envi- 
ronment-mapped bump mapping, you use an indirect bump texture to offset texture coordinates generated 
via surface normals. The perturbed normals are then used to look up an environment texture. The environ- 
ment texture may contain complicated lighting effects, or it may be a reflection map of the environment. 
(One caveat is that the environment map is viewpoint-dependent; i.e., it must be regenerated whenever 
the viewpoint changes.) There are two EMBM methods: 


* The normal perturbations are modeled with respect to a flat surface (dS, dT), and during runtime they 
are transformed onto arbitrary surfaces. This method requires three TEV stages plus one indirect 
stage to calculate the modified texture coordinates. 


¢ The normal perturbations are modeled in 3D (dX, dY, dZ), and during runtime they are matched with 
specific surfaces and transformed into eye space. This method requires only one TEV stage plus one 
indirect stage to compute. 
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10.1 Setting up indirect texture stages 


In the figure below, we illustrate the texture hardware as described in the previous chapters. For each TEV 
stage, you may assign a texcoord and a texture map to be looked up. Texture lookup and processing 
occurs in each TEV stage. 


Figure 51 - Regular texture functional diagram 
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We now describe the indirect texture hardware in more detail. As mentioned above, an indirect lookup con- 
sists of an indirect lookup stage, after which the results are combined with the regular texture coordinates 
for another (regular) lookup stage. There can be up to four different indirect lookup stages. The results 
from any indirect lookup may be used in any TEV stage to modify a regular texture lookup. Thus there can 
be up to 16 modified regular lookups. 


Note: The total number of texture maps available remains at 8. In addition, any texture map that is desig- 
nated as being used for an indirect lookup cannot also be used as a target for a regular lookup 
(and vice versa—a given texture map can be only one or the other). 


Performance-wise, adding an indirect stage is like adding another TEV stage. It increases the time 
required to process each quad (2x2 pixels) by an additional clock cycle. 
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In the figure below, we add in the indirect hardware as it has been described thus far. 


Figure 52 - Regular and indirect texture functional diagram 
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The figure also shows some of the relevant functions. First is G<SetNumIndStages, which controls how 
many indirect texture lookups there will be. For each indirect stage, you must specify a texture coordinate 
and map. This is done using GXSet IndTexOrder: 


Code 68 - GXSetindTexOrder 


void GXSetIndTexOrder (GXIndTexStageID stage, 
GXTexCoordID coord, 
GxTexMapID map); 


As mentioned, a map that is assigned to an indirect stage in this way cannot also be assigned to a regular 
TEV stage as well. A given map can only be indirect or regular. 


On the other hand, a given texture coordinate can be shared by both an indirect lookup and a regular 
lookup. If the indirect map is the same size as the regular map, then the sharing is straightforward. If the 
sizes differ, there is still the possibility to share. If the regular map is larger than the indirect map, you may 
scale it down by a power of two for use with the indirect lookup while still using the unscaled texture coor- 
dinate for the regular lookup. This is set by the following call: 


Code 69 - GXSetindTexScale 


void GxXSetIndTexCoordScale (GXIndTexStageID indStage, 
GXIndTexScale scales, 
GXIndTexScale scaleT); 


RVL-06-0037-001-A 
Released: February 27, 2006 


© 2006 Nintendo 
CONFIDENTIAL 


Basic indirect texture processing 107 


The figure below shows an example for this case. The block labeled “Scale to Map 2 size” is the regular 
texture coordinate scaling that the hardware does. The block labeled “Scale by 1/4” would happen as a 
result of calling GXSet IndTexCoordScale with parameters specifying 1/4 scaling. 


Figure 53 - Texture coordinate sharing example 
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If we share texture coordinates for this example and set the indirect scaling to 1, then the texture coordi- 
nate would either wrap or clamp as it accesses the indirect texture. The behavior depends upon the set- 
tings for the indirect map being looked up. 
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10.2 Basic indirect texture processing 


We now begin to describe what is possible inside the “Op” block shown above, where the colors coming 
from the indirect lookup are processed and combined with the regular texture coordinate. The figure below 
shows some of the major components of the indirect texture processing block. More details will be shown 
in a later section. 


Figure 54 - Indirect texture processing, part 1 
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When the indirect looked-up color enters the processing box, the first thing that happens is a mapping of 
the color components to the (s, ¢, u) texture offsets. As shown, red (R) is discarded, green (G) maps to u, 
blue (B) maps to ¢t, and alpha (A) maps to s. 


Note: Some remapping may be done later through the use of the indirect matrix (described below). We 
chose this initial mapping in order to make efficient use of IA8 texture maps. 


Next, the texture offsets go through the format block. You may choose whether some or all of the bits of 
each component are actually used; that is, specifically, you may choose whether the lower 3, 4, 5, or 8 bits 
of each component are used. Normally, you will use all 8. (The reason for the partial choices is described in 
"10.5.1 Texture tiling and pseudo-3D texturing" on page 111.) 
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Next, you can apply a bias. If 8 bits were chosen in the format block, then the bias is -128, allowing for 
signed offsets. If a smaller number of bits was chosen in the format block, then the bias is +1. This is a spe- 
cial case that will also be explained in section 10.6. 


The next operation is a matrix-vector multiply. You may configure up to three different static matrices and 
choose which one to use for a given indirect texture operation. The components of the matrix are in the 
range [-1 ... +0.999] (a sign bit plus 10 bits of mantissa). As shown, the matrix has 2 rows and 3 columns, 
and it appears on the left side of the multiply. On the right side is a column vector consisting of the (s, ¢, u) 
offsets. The matrix may be used for rotation, scaling, and remapping of the offsets. 


A scale value is associated with each static matrix. You may choose a scale value by specifying an expo- 
nent of 2 in the range of [-17 ... +46]. This scale value can be used to stretch the offsets over the size of 
the regular texture map that will be associated with this indirect operation. You can also think of the scale 
as the fixed-point exponent for the matrix values. Since a value equal to +1.0 cannot be stored in the 
matrix, you can store 0.5 and use an exponent of 1 greater than the desired value in order to compensate. 


You can optionally wrap the regular texture coordinate used with this lookup operation. You may specify a 
wrap value of 0, 16, 32, 64, 128, 256, or none. Using 0 effectively zeroes out the regular texture coordinate 
values. 


Once all of the above processing has taken place, the offsets are added to the regular texture coordinates. 
This becomes the effective texture coordinate that will be used in the texture lookup for the associated TEV 
stage. 

10.3 Basic indirect texture functions 


We now describe more indirect texture functions. Use this function to set an indirect matrix and scale 
value: 


Code 70 - GXSetindTexMtx 


void GxXSetIndTexMtx( GXIndTexMtxID mtx_id, 
£32 offset[2] [3], 
s8 scale exp); 


As mentioned above, there are three indirect matrices, and mtx_id specifies which one to set. The offset 
values must be in the range of [-1 ... +0.999] (sign plus 10-bit mantissa), and scale_exp must be in the 
range of [-17 ... +46]. 


The following group of functions will set up the indirect hardware to enable a particular special effect. 

Note: These functions only set the indirect hardware, and you must still set up the TEV, texture maps, 
texgens, and other parts of the system in order to achieve the desired effect. 

10.3.1 Texture warping 

Code 71 - GXSetTevindWarp 


GXSetTeviIndWarp ( 
GXTevStageID tevStage, // Name of TEV stage being modified 
GXIndTexStageID indStage, // The indirect stage that specifies the warp map 
GXBool signedOffsets, // True for s8 offsets, false for u8 offsets 
GXBool replaceMode, // True to replace, false to modify regular coords 
GXIndTexMtxID matrixSel); // Which indirect matrix and scale to use 
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This function provides the ability to warp a regular texture lookup. The indirect map should contain 8-bit off- 
sets. The signedOffsets parameter controls whether or not a bias of -128 is applied to the offsets. The 
replaceMode parameter controls whether the offsets completely replace the regular texture coordinates 
(GX_ TRUE), or if they merely offset them (GX_ FALSE). In effect, this selects a zero wrap value or a wrap 
value of none. The matrixSel parameter chooses which of the available indirect matrices (and associated 
scale values) to use. 








The SDK graphics demo ind-warp shows one way to use this function. 


10.3.2 Environment-mapped bump-mapping (dX, dY, dZ) 
Code 72 - GXSetTevindBumpXYZ 


GXSetTevIndBumpXxyYZ ( 
GxXTevStageID tevStage, // Name of TEV stage being modified 
GXIndTexStageID indStage, // The indirect stage that specifies the bump map 
GXIndTexMtxID matrixSel); // Which matrix/scale slot to use 


GXSetTevIndBumpXyYZ sets up an environment-mapped bump-mapped (dX, dY, dZ) texture lookup. This 
is basically a perturbed lookup into a spherical reflection map. As an indirect operation, this is just a warp 
with signed offsets. The bump map must contain biased normal offsets in 3D model space. It must be an 
RGBA texture with dX in A, dY in B, and dZ in G. You must load the indirect matrix with a transform for nor- 
mals that goes from model space to eye space. The scale value must contain the size of the reflection map 
divided by 2 (and thus the reflection map must be a square power of 2 size). You must set up a normal- 
based texgen for the regular texture coordinate. Alternately, you may avoid sending vertex normals by put- 
ting normals (not just offsets) in the texture map. See the Advanced Rendering section in this guide for 
more details on bump mapping. Also, refer to the demo ind-bump-xyz. 
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10.4 Advanced indirect texture processing 


Before moving on to the next indirect functions, let’s examine more details in the indirect texture process- 
ing block. These details are illustrated in the following figure, which builds upon "Figure 54 - Indirect texture 


processing, part 1" on page 107. 
Figure 55 - Indirect texture processing, part 2 
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The figure above shows four additional features of the indirect processing hardware: 

*« You may select a “bump alpha” value to be extracted from the indirect lookup color. 

« Adynamic matrix may be created, based upon the incoming regular texture coordinates. 

* You may select whether the original or modified texture coordinate is used to compute texture LOD. 


« You may feed the texture coordinate calculated from one stage as an additive input to the next. 


10.4.1 Selecting “bump alpha” 


You may specify that certain bits from the indirect looked-up color be available as “bump alpha.” Which bits 
are used depends upon the format selected for the indirect offsets. If you chose to use 3, 4, or 5 bits from 
the colors as offsets, then the remaining 5, 4, or 3 bits are available for selection as bump alpha. If you 
chose the 8-bit format, then the upper 5 bits may be duplicated as bump alpha. In any case, you choose 
whether to extract the bits from the s, f, or u component; the resulting bits are left-aligned in the bump- 
alpha 8-bit field. Bump alpha is available as a TEV-stage color input (by using GkSet TevOrder). You may 
select the plain bump alpha, or a “normalized” bump alpha (i.e., a bump alpha multiplied by a factor of 255/ 


248). 
Note: There is a restriction on this feature: bump alpha is not available for TEV stage 0. 
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10.4.2 Dynamic matrices 


There are more choices for the indirect matrix. You may select from the three static matrices, two types of 
dynamic matrices, or a matrix of all zeros. For each matrix selection (except the zero matrix), there is a cor- 
responding scale selection as well. If you use a dynamic matrix, you may choose one of the three scale 
values associated with the static matrices. Dynamic matrices are set based upon the incoming regular s/t 
coordinate values: 


Equation 30 - Dynamic indirect matrices 


Roe “c" matri s/256 0 0 Pe "t" matri 0 sf/256 0 
2 S PIairix : 2 PRALITX 

£/256 0 0 = 0 £/256 0 
10.4.3 Selecting texture coordinates for texture LOD 


You may select to use the original (unmodified) or the modified texture coordinates for the MIPMAP LOD 
computation. When doing texture tiling (described below), you should use the unmodified texture coordi- 
nates for this purpose. In most other cases, you should use the modified coordinates. 

10.4.4 Adding texture coordinates from previous TEV stages 


You may choose to add in the texture coordinate computed in the previous TEV stage. This allows even 
more complicated expressions to be built up over multiple stages. It also allows a complicated result to be 
reused for more than one lookup. 

10.5 Advanced indirect functions 

We now describe functions that take advantage of the features mentioned in "10.4.4 Adding texture coordi- 
nates from previous TEV stages" on page 111. 

10.5.1 Texture tiling and pseudo-3D texturing 


Code 73 - GXSetTevindTile 


GxXSetTevindTile ( 
GXTevStageID tevStage, // Name of TEV stage being modified 
GXIndTexStageID indStage, // The indirect stage that specifies the tile map 
ul6 tileSizes, // Size of tile in S dimension 
ul6é tileSizeT, // Size of tile in T dimension 
ul6é tileSpacings, // Tile spacing in S dimension 
ul6 tileSpacingT, // Tile spacing in T dimension 
GXIndTexFormat format, // Format of indirect offsets 
GXIndTexMtxID matrixSel, // Which indirect matrix slot to use 
GXIndTexBiasSel biasSel, // For pseudo-3D, selects tile-stacking direction 
GXIndTexAlphaSel alphaSel); // For pseudo-3D, selects bump alpha 


This function specifies texture tiling or pseudo-3D texture lookup. You specify the tile size and spacing sep- 
arately. Using a spacing different than the tile size allows borders for mipmapping purposes. Depending 
upon the height of the mipmap stack, texels outside of the tile area may be included in the filtering calcula- 
tions for mipmapping. This function will set up the matrix values and scale value appropriately based upon 
the given inputs; you need only to specify which matrix slot to use. The biasSe/ and alphaSel are used only 
for pseudo-3D lookups (see below). You set these to GX_ITB NONE and GX_ITBA_ OFF, respectively, for 
normal 2D tiling. 





Note: Texture tiling can take advantage of the same texture coordinate for use with the indirect map and 
the regular map. There is a complication, however, since the desired scale values for the regular 
texture coordinates are not directly related to the size of the regular map, which contains the tile 
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definitions (refer back to "Figure 49 - Tiled texture mapping" on page 103). Normally, GX will set 
the texture coordinate’s scale size to the size of the map being looked up, with preference for the 
regular map size if a texture coordinate is being shared. Since you need to use a different scale 
altogether with texture tiling, you must use this function: 


Code 74 - GXSetTexCoordScaleManually 


GXSetTexCoordScaleManually ( 


GXTexCoordID texCoord, // Name of the texcoord being affected 

GXBool enable, // GX_TRUE = manual scaling; GX_FALSE = automatic 
ul6é ss, // Manual scale value for S dimension 

ul6é ts ); // Manual scale value for T dimension 





Once GXSet TexCoordScaleManually has been called with enable set to GX_ TRUE, the given texture 
coordinate scale values are fixed until the function is called again. If the function is called with enable set to 
GX_ FALSE, then automatic texture coordinate scaling takes over once again for that texcoord. 





Note: When you are manually scaling, you should also call GkxSet TexCoordBias. The bias is normally 
set automatically by the GX API, but when a texture coordinate is being scaled manually, the bias 
is no longer modified by GX and will be stale from the last time it was set. 


For texture tiling, the desired texture coordinate scale is the tile size multiplied by the size of the indirect 
map. You then use GXSet IndTexCoordScale to divide out the tile size for use in accessing the indirect 
map. For an example of texture tiling, refer to the demo ind-tile-test. 


In order to support pseudo-3D texture lookup, you must call GXSet TevIndTile for two adjacent TEV 
stages. The first stage resembles a normal 2D tiling specification. For the second stage, you specify a bias 
select and alpha select. The bias is used to select the tile-stacking direction. You use GX_ITB_S when the 
next tile is offset in s, and GX_ITB_T when the next tile is offset in t. You then choose a bump alpha in 
order to blend between the tile from the first lookup and the tile from the second lookup. 


Note: You cannot use the 8-bit format for pseudo-3D. Instead, you must use the 3-, 4-, or 5-bit format. 
These formats use a bias value of +1 instead of -128. The +1 bias is used to get the “next” tile in 
the second stage. Refer to the demo ind-pseudo-3d to see one way to use this feature. 

10.5.2 Environment-mapped bump-mapping (dS, d7) 


Code 75 - GXSetTevindBumpST 


GXSetTeviIndBumpsT ( 
GXTevStageID tevStage, // Name of first TEV stage to insert EMBM lookup 
GXIndTexStageID indStage, // The indirect stage that specifies the bump map 
GXIndTexMtxID matrixSel); // Which scale/matrix slot to use 


This function sets up an environment-mapped, bump-mapped (dS, dT) texture lookup. Similar to 
GXSetTeviIndBumpxyYZ, this sets up a perturbed lookup into a spherical reflection map. The difference is 
that the bump map in this case contains deltas for (s, t). Such a lookup requires three TEV stages to com- 
pute the offset texture coordinates. The resulting texture coordinate will be available two stages after the 
one specified in the call. This function makes use of the dynamic matrices in order to transform the (s, f) 
offsets to be relative to the incoming regular (s, ft) (which come from the object normals). 


You must set up the desired offset scale value using GXSet IndTexMtx. The scale value must contain the 
size of the reflection map divided by 2 (and thus the reflection map must be a square power of 2 in size). 


Note: No static matrix is actually used in the texture coordinate computation. Only dynamic matrices and 
the scale value are used. 
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The geometry associated with this lookup must include normals, binormals, and tangents. You must set up 
three normal-based texgens for the regular texture coordinates. The binormal texgen goes to the first TEV 
stage, the tangent texgen goes to the second stage, and the normal texgen goes to the third stage. An 
additional texgen is used for the indirect coordinate. You should use an IA8 texture format for the bump 
map, with s offsets in alpha (A) and t offsets in intensity (I). You must set up the first two TEV stages used 
so that they do not actually look up the textures (by using GX_TEX DISABLE). 








Refer to the demo ind-bump-st to see this kind of environment-mapped bump-mapping in action. Also 
refer to the Advanced Rendering section for more details on bump mapping. 


Having used three TEV stages to compute the texture coordinate for an EMBM (dS, dT) lookup, you can 
use the result to do more than one lookup. In order to perform successive lookups without taking three 
stages to compute each one, use the texture coordinate feedback feature. You may call 
GXSetTeviIndRepeat to set this up: 


Code 76 - GXSetTevindRepeat 


GXSetTeviIndRepeat ( 
GXTevStageID tevStage) ; // Name of TEV stage being modified 


This function allows you to use the texture coordinates computed in the previous TEV stage for the named 
TEV stage. It is typically used only after G<Set TevIndBumpST. 


10.5.3 General indirect texturing 


The functions described so far implement the most obvious uses of the indirect texture hardware. There is 
one more function available to set the indirect texture hardware directly, in case developers think of addi- 
tional uses for the hardware. 


Code 77 - GXSetTevindirect 


GXSetTeviIndirect ( 
GXTevStageID tevStage; // TEV stage name 
GXIndTexStageID indStage; // the indirect stage to be used with this TEV stage 
GXIndTexFormat format; // format of the indirect texture offsets 
GXIndTexBiasSel biasSel; // Selects which offsets (S, T) receive a bias 
GXIndTexMtxID matrixSel; // Selects which indirect matrix and scale to use 
GXIndTexWrap wraps; // Wrap value of regular S coordinate 
GXIndTexWrap wrapT; // Wrap value of regular T coordinate 
GXBool addPrev; // Add output from previous stage to texture coords? 
GXBool utcLOD; // Use unmodified texture coordinates for LOD calc? 
GXIndTexAlphaSel alphaSel); // Selects indirect texture alpha output 











GXSetTevIndirect allows you to set the various parameters for a given indirect operation manually. 


Note: The indirect texture coordinate processing block is used for all TEV stages. If you do not want to 
perform a modified lookup, select the zero texture matrix and turn off wrapping, feedback, and 
bump alpha. We provide a convenience function to do this: 


Code 78 - GXSetTevDirect 


GxXSetTevDirect ( 
GxXTevStageID tevStage) ; // TEV stage name 
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11 Fog, Z-compare, blending, and dithering 


The final pixel output operations include fog, Z-compare, blending, and dithering. These operations are the 
final steps of the pixel pipeline before a pixel is either written to the frame buffer or discarded. Fog allows 
pixel values to be blended with the fog color based upon the distance from the viewer. The Z-compare 
operation determines whether or not rasterized pixels will be written to the frame buffer based upon a Z 
value comparison. The blending operation allows the rasterized pixel color to be mixed with the color exist- 
ing in the frame buffer. Logic operations are also possible in the blender. Dithering takes place last. 


11.1 Fog 


Fog, if enabled, blends a constant fog color with the pixel color output from the last active Texture Environ- 
ment (TEV) stage. The percentage of fog color blended depends on the fog density, which is a function of 
the distance from the viewpoint to a quad (2x2 pixels). 


There are five possible fog density functions: 

¢ — Linear. 

¢« Exponential. 

« Exponential squared. 

« Reverse exponential. 

« Reverse exponential squared. 

You may program a near and far Z for the fog function independent of the clipping near and far Z. 


The eye-space Z used for fog computations does not represent the correct range unless the viewer is fac- 
ing the same direction as the z-axis. The GP can compensate for this with a range adjustment factor based 
upon the x position of the pixels being rendered. This boosts the eye-space Z value used for the fog com- 
putation (the y direction is not compensated), effectively increasing the fog density towards the edges of 
the screen in order to make the effect more realistic. 


Figure 56 - Fog range adjustment 
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11.1.1 Fog curves 


These curves show the fog density as a function of range with startz = 50 and endz = 100. 


Fog Density 
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Figure 57 - Linear fog curve 
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Figure 58 - Exponential fog curve 
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Figure 59 - Exponential squared fog curve 
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Figure 60 - Reverse exponential fog curve 


GxX_FOG_REVEXP — 


0.6 


0.6 


o4 


0.2 





0 
0 20 40 60 80 100 
Range 
Figure 61 - Reverse exponential squared fog curve 
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11.1.2 Fog parameters 
You can control fog by using the following function: 
Code 79 - GXSetFog 


void GXSetFog(GXFogType type, 


£32 startz, 
£32 endz, 
£32 nearz, 
£32 farz, 
GXColor color); 


The parameters startz and endz control where the fog function starts and ends, respectively. Usually, the 
endz value is set to the far plane Z. The nearz and farz are needed to convert the rasterized screen space 
Z value into eye-space Z for fog computations. Color is the color of the pixel when the fog density is 1.0. 


The horizontal fog range adjustment is turned off by default in GXInit. In order to use this feature, you 
must call the following two functions: 


Code 80 - Fog range adjustment functions 


void GXInitFogAdjTable ( 


GXFogAdjTable* table, 
ul6é width, 
£32 projmtx[4][4] ); 


void GxXSetFogRangeAdj ( 


GXBool enable, 
ul6é center, 
GXFogAdjTable* table ); 


The first function is used to compute the adjustment table. The user must provide the allocated space for 
this table. The width parameter specifies the width of the viewport. The projmtx parameter is the projection 
matrix that will be used to render into the viewport. This parameter is needed for the function to compute 

the viewport’s horizontal extent in eye space. 


Once the fable has been computed, it can be passed to GXSet FogRangeAdj. The enable parameter indi- 
cates whether horizontal fog range adjustment is enabled or not. The center parameter should be the x- 
coordinate at the center of the viewport. The range adjust function is symmetric about center. 


11.2 Z-compare 


You may write to the frame buffer conditionally by comparing the Z value for the rasterized pixel against the 
Z value for the pixel already in the frame buffer. This comparison may happen at one of two places within 
the graphics pipeline: the comparison may take place before a pixel is textured, or it may take place after 
texturing has been done (see "Figure 1 - Schematic of the GP" on page 10). Normally, the Z-compare 
occurs before texturing because this typically enhances performance; that is, memory bandwidth is 
reduced by not looking up textures for pixels that will not be visible. However, the alpha-compare logic is 
tied together with the Z-compare logic, so using alpha-compare requires placing the Z-compare after tex- 
turing. Using Z textures also requires that the Z-compare occur after texturing. The function 
GXSetZCompLoc sets whether the Z-compare happens before or after texturing. The compare is set to 
“before” in GXInit. 
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The function GXSet ZMode is used to control the Z-compare: 
Code 81 - GXSetZMode 


void GXSetZMode ( 


GXBool compare enable, 
GXCompare func, 
GXBool update enable ); 


The compare_enable parameter can be used to disable the Z-compare altogether. When compare_enable 
is false, writes to the Z buffer are also disabled. The func parameter sets the comparison function (never, 
less, less or equal, equal, not equal, greater or equal, greater, or always). The update_enable parameter 
sets whether or not new Z values are written to the Z buffer when Z-compares are enabled. 


11.2.1. Z buffer format 


The Z buffer is 24 bits wide in non-antialiased mode, and 16 bits wide in antialiased mode. When using a 
16-bit Z buffer, a number of different compressed formats are available to make better use of the limited 
number of bits. The compression algorithm performs a type of reverse floating point encoding because the 
properties of screen space Z necessitates clumping most of the resolution towards the high end of the 
number scale, whereas conventional floating point notation clumps most of the resolution towards the low 
end of the number scale. 


The system supports various compression schemes, with selection being made based on the far-to-near 
ratio. For orthographic projection or small far-near ratios, you can use a linear format; it just strips the lower 
8 bits from the input Z. For medium far-near ratios, you can use a 14e2 format that has an effective resolu- 
tion of 15 bits at the near plane and 17 bits at the far plane. For high far-near ratios, you can use a 13e3 
format that has an effective resolution of 14 bits at the near plane and 20 bits at the far plane. A 12e4 for- 
mat is also available. 


The 16-bit formats available are listed below: 


Table 12 - 16-bit Z buffer formats 

















Format name DY =x-Yoq al oy d (oY) Use When 
GX_ZC_LINEAR Linear far/near <= 2416 
GX_ZC_NEAR 14e2 far/near <= 2418 
GX_ZC_MID 13e3 far/near <= 2420 
GX _ZC_FAR 12e4 far/near <= 2424 

















It is always best to use as little compression as possible (i.e., use as many mantissa bits in the Z format as 
possible). You get less precision with higher compression. The “far” in the above text does not necessarily 
refer to the far clipping plane. You should think of it as the farthest object you want correct occlusion for. 


The Z buffer format is set using GkSet Pixel Fmt. This function affects both the color buffer format and the 
Z buffer format. For information about color buffer formats, refer to "12 Video output" on page 125. 


Note: Changing the frame buffer format requires flushing the pixel pipeline. 
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11.3. Blending 


The blending operation combines the pixel color (output from the TEV and fog operations), also called the 
source color, with Embedded Frame Buffer (EFB) color, also called the destination color. The pixel’s alpha, 
or source alpha, can always be used as a factor in the blending operation. In addition, if the EFB format is 
GX PF _RGBA6 224, then you can blend the source alpha channel with the EFB alpha, also called desti- 

nation alpha. 


When rendering non-antialiased images, four pixels per clock are blended. When rendering antialiased 
images, six samples, or two pixels per clock, are blended. 


You can set the main blending controls using: 
Code 82 - GXSetBlendMode 


void GxXSetBlendMode ( 
GXBlendMode type, 
GXBlendFactor src factor, 
GxXBlendFactor dst factor, 
GXLogicOp op }? 





The type parameter selects between blending operations, GX_BM BLEND, or logical operations, 
GX BM LOGIC. Setting type to Gx BM NONE writes the source pixel directly to the EFB. Call 
GXSetColorUpdate and GXSetAlphaUpdate to enable the writing of the blending result to the EFB. 





11.3.1 Blend equation 
The blend equation is: 
Equation 31 - Blending 
pix_color = Clamp (src_factor x src_color + dst_factor x dst_color) 


The RGB blenders are identical. The alpha blender differs in that a constant alpha can override the result 
of the blend equation using GXSetDstAlpha. 


The src_factor and dst_factor are both normalized. The normalization process adds the most significant bit 
(MSB) of the factor to itself, thus 0 through 127 are unchanged and 128-255 will map to 129 to 256. The 8- 
bit color is multiplied by the 9-bit factor and the result is rounded. The result of the two multiplies are added 
and the sum is clamped to 255. This technique allows colors to be multiplied by 1.0 (255 = 1.0), preserving 
the high end of the color range. 


Note: HW2 adds a new blend function; refer to "15 GX updates for HW2" on page 155 for details. 
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11.3.2 Blending parameters 
The following table lists the possible values of src_factor and dst_factor. 


Table 13 - Blending parameters 















































Yo} U I xex-mo) mm DI--J tl ar-1t(ol tM ar-(ea co) g I NX=Xe| Green Blue Alpha 
GX_BL_ZERO 0 0 0 0 
GX_BL_ONE Oxff Oxff Oxff Oxff 
GX_BL_SRCCLR Rs Gs Bs As 
GX_BL_INVSRCCLR Oxff — Rs Oxff — Gs Oxff — Bs Oxff — As 
GX_BL_DSTCLR Rd Gd Bd Ad 
GX_BL_INVDSTCLR Oxff — Rd Oxff — Gd Oxff - Bd Oxff — Ad 
GX_BL_SRCALPHA As As As As 
GX_BL_INVSRCALPHA Oxff — As Oxff — As Oxff — As Oxff — As 
GX_BL_DSTALPHA Ad Ad Ad Ad 
GX_BL_INVDSTALPHA Oxff — Ad Oxff — Ad Oxff — Ad Oxff — Ad 
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11.3.3 Logic operations 


These logical operations are supported: 


Table 14 - Logic operations 


Logic Op Name Operation 
































GX_LO_INVAND 


GX_LO_REVAND 


GX_LO_CLEAR 0x00 

GX_LO_SET Oxff 

GX_LO_COPY Source 
GX_LO_INVCOPY ~Source 
GX_LO_NOOP Destination 
GX_LO_INV ~Destination 
GX_LO_AND Source & destination 
GX_LO_NAND ~(Source & destination 
GX_LO_OR Source | destination 
GX_LO_NOR ~(Source | destination) 
GX_LO_XOR Source “ destination 
GX_LO_EQUIV ~(Source “ destination) 


Source & ~destination 


~Source & destination 





GX_LO_REVOR 


Source | ~destination 








GX_LO_INVOR 





~Source | destination 


Logic operations and blend operations are mutually exclusive. 
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11.4 Dithering 


The pixel can be dithered after blending when the pixel format is either GX_PF RGBA6 224 or 
GX_PF_RGB565_ 216. There is no performance penalty for turning on dithering. Each 8-bit color compo- 
nent is scaled and normalized appropriately (see below) and the corresponding entry in the standard 4x4 
Bayer matrix is added. The Bayer matrix is screen-aligned and repeated over the entire screen. 


Equation 32 - Bayer matrix 
x —_> 


ge |o- 8. 2 Yo 
12 4 14 6 
3 14 1 «9 
15 7 13 5 


The following equations compute scaling and normalizing for dithering: 
Equation 33 - 5-bit dithering (ideal) 

Con = CCC, * 247/255) <<1) + dither[x &3, y &3]) >>4 
Equation 34 - 5-bit dithering (approximation actually used) 
Con =CCCy, -CCy, >> 5)) <<1)+4+dither[x &3, y &3]} >>4 
Equation 35 - 6-bit dithering (ideal) 

Com = (CC, *251/255) << 2+ dither[x & 3, y &3]) >> 4 
Equation 36 - 6-bit dithering (approximation actually used) 


Con =CCCm -CCm >> 6)) << 2) +4 dither[x & 3, y &3]) >>4 
You can enable dithering by calling the GxSetDither function. Dithering is enabled by default in GxInit. 
Code 83 - GXSetDither 


void GXSetDither( GXBool enable ); 


© 2006 Nintendo RVL-06-0037-001-A 
CONFIDENTIAL Released: February 27, 2006 


124 Graphics Library (GX) 


RVL-06-0037-001-A © 2006 Nintendo 
Released: February 27, 2006 CONFIDENTIAL 


125 


12 Video output 


The Embedded Frame Buffer (EFB) cannot send pixel data directly to the video interface; therefore, the 
frame buffer must first be copied out to main memory. We call the frame buffer in main memory the Exter- 
nal Frame Buffer (XFB). In this section, we will fully describe the copy operation necessary to transfer EFB 
to XFB. 


Since the GP copy operation relates closely to video interface functionality, the remainder of this section 
often refers to Video Interface library (VI). For a complete description of this library, see the Video Interface 
Library (VI) section in this guide. 

12.1 The copy pipeline 

The diagram below illustrates the operations applied during the frame buffer copying process. 


Figure 62 - EFB-to-XFB copy pipeline 
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12.1.1 Copy source 
The EFB source of the copy operation is specified by the following API: 
Code 84 - GXSetDispCopySrc 


void GxXSetDispCopySrc ( 
ul6 left, 
ul6 top, 
ul6 wd, 
ul6 ht ); 


GXSetDispCopySrc defines a sub-region of pixels in the EFB memory as the source for the copy opera- 
tion. Since the GP works on regions of 2x2 pixels, there is the restriction that all of the source copy param- 
eters be even numbers. 


12.1.2 Antialiasing and deflickering 


The GP performs antialiasing in two parts. During rendering, it can rasterize to a super-sampled EFB. Dur- 
ing the copy operation, the multiple samples per pixel can be filtered together to create the final pixel out- 
put color. In fact, samples from more than one row of pixels can be filtered together, enabling deflickering 
to be performed during the copy operation as well. Samples from up to three rows of pixels can be filtered 
together (with some restrictions). A more detailed discussion of Nintendo GameCube antialiasing and 
deflickering may be found in the Advanced Rendering section in this guide. 
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The antialiasing mode is determined by the frame buffer pixel format. "12.4 Embedded frame buffer for- 
mats" on page 131 discusses how to set this format. The following function sets all super-sample locations 
and sample filter weights: 


Code 85 - GXSetCopyFilter 


u32 GXSetCopyFilter ( 
GXBool aa, 
us sample pattern[12] [2], 
GXBool vf, 
us vfilter[7] ); 


The aa parameter indicates whether to use the supplied sample_pattern, or a default pixel-centered pat- 
tern. The sample_pattern parameter indicates the exact location of each pixel subsample. The vf parame- 
ter indicates whether to use the supplied vfilter, or a default single-line filter. The vfilter indicates the 
weights to use for each sample. Refer to the Advanced Rendering section for more details on these 
parameters. 


When rendering in non-antialiased mode, the sample pattern must be set to a pixel-centered pattern, or 
else visual anomalies will result. Similarly, the vertical filter must always be set correctly depending upon 
the render mode chosen. Note, however, that the vertical filter only samples from the adjacent rendered 
pixels (in the EFB). When rendering in field mode, such pixels are not adjacent during scan-out (since the 
odd and even fields from the XFB will be interlaced with each other); therefore, the vertical filter should not 
be used in this mode. Moreover, the vertical filter is unnecessary in double-strike mode. See below for fur- 
ther discussion about render modes. 


The same copy hardware is used for the video copy path as well as the texture copy path, thus it may be 
necessary to change the sample pattern and vertical filter within a frame when doing a texture copy fol- 
lowed by a video copy. 


Clamping is generally required when copying the first and last scan lines from the source rectangle. This 
makes sure that the GP uses valid data when sampling above the first scan line and below the last. How- 
ever, there may be times when it is necessary to disable clamping (see "12.2.7 Interlaced, antialiased 
frame-rendering, deflicker mode" on page 130). Clamping is controlled by this call: 


Code 86 - GXSetCopyClamp 





u32 GXSetCopyClamp( GXFBClamp clamp ); 


12.1.3 Gamma correction 


Pixel values may be gamma-corrected during the copy operation. Three choices of gamma correction are 
available: 1.0, 1.7, and 2.2. The default gamma set in GXInit is 1.0. 


Note: The gamma for texture copy is fixed at 1.0. The display-copy gamma can be set with the following 
function: 


Code 87 - GXSetDispCopyGamma 


u32 GXSetDispCopyGamma( GXGamma gamma ); 


Determining the appropriate gamma correction will depend on your design for the game. For many games, 
you find it preferable to maximize low intensity color resolution by setting the gamma correction at 1.0. For 
a more detailed discussion of gamma correction, refer to "2.1.3 Gamma correction" on page 7 of the Dem- 
onstration Library (DEMO) section. 


RVL-06-0037-001-A © 2006 Nintendo 
Released: February 27, 2006 CONFIDENTIAL 


The copy pipeline 127 


12.1.4 RGB to YUV 


A luma/chroma YUV format stores nearly the same visual quality pixel as RGB does, but requires only two- 
thirds of the memory. Therefore, we convert RGB EFB to YUV XFB during the copy operation to save on 
the amount of main memory used for the frame buffer. 


Note: There is a corresponding savings of main memory bandwidth as well (for both the copy operation 
and the XFB video scan-out). For conversion details, see "12.5 External frame buffer format" on 
page 132. 


As an example, consider a 640x240 (150,000 pixels) frame buffer. A frame buffer this size requires 900KB 
to double-buffer using 24-bit/pixel RGB format. However, a 16-bit/pixel YUV format requires only 6OOKB, 
which saves a difference of 300KB in main memory. 


12.1.5 Yscale 


The Nintendo GameCube can arbitrarily scale a rendered image both horizontally and vertically. Vertical y 
scale occurs during the copy process, while horizontal x scale occurs during the video display. For more 
details on scaling (x, y) for display, see "4.2 Initialization" on page 11 of the Video Interface Library (VI) sec- 
tion. The following function sets the y scale factor: 


Code 88 - GXSetDispCopyYScale 


u32 GXSetDispCopyYScale( £32 yscale ); 


The function returns the number of lines that will be copied, which can be used to compute the XFB size. 


12.1.6 Copy destination 
The destination of the copy operation is defined by: 
Code 89 - GXCopyDisp 


void GxXSetDispCopyDst( ul6 width, ul6 height ); 
void GxXCopyDisp( void *dest, GXBool clear ); 


GXSetDispCopyDst must be called in order to set the proper stride for the copy operation. GXCopyDisp 
specifies the XFB destination in main memory and actually issues the copy command. The destination 
XFB must begin at a 32-byte aligned address; moreover, the amount of memory required depends on 
width alignment. For a complete list of rules for allocating the correct amount of XFB memory, see "4.2.2 
Frame buffer allocation" on page 12 of the Video Interface Library (VI) section. 


12.1.7 Clear color and Z for next frame 


The copy operation can clear the color frame buffer and the Z buffer during the copy. This eliminates clear 
time when rendering the next frame. To perform a clear during the copy operation, use the clear parameter 
in GXCopyDisp. The clear parameter is only effective if the buffer has been enabled for update (see 
GXSetColorUpdate, GXSetAlphaUpdate, and GXSetZMode). This allows individual buffers to be 
cleared during the copy operation. The following function specifies the clear color and clear Z values to use 
during the copy operation: 


Code 90 - GXSetCopyClear 


void GxXSetCopyClear( GxXColor clear_clr, u32 clear _z); 
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The clear_cir parameter is in RGBA8 format, while the clear_z parameter is in 24-bit format. The parame- 
ters are converted into the proper format during the clear operation. The constant GX MAX 224 specifies 
the maximum depth value. 

12.2 Predefined render modes 


The set of controls necessary to correctly configure the GX and VI libraries is complex. Many game design- 
ers like to be able to customize these controls, so the GX and VI libraries provide all of the necessary con- 
trols to give developers maximum control. The GX API provides a set of predefined rendering modes 
containing all of the parameters necessary to make this task simpler. 


Each rendering mode contains the following data: 

¢« EFB and XFB size and format information. 

* Position of the XFB on TV screen. 

« TV video format (interlaced or double-strike version of NTSC, PAL, M/PAL). 

« Antialiasing state and deflicker filter. 

* Field-rendering mode (i.e., enabled or not). 

For information on TV video formats and field-rendering, refer to the Video Interface Library (VI) section. 


The diagram below illustrates the relationships between the render mode structure, related GX calls, and 
parts of the graphics hardware pipeline: 


Figure 63 - Render mode structure, related calls and hardware modules 





Render Mode GXSetFieldMode +) GP pipeline 


GXSetViewportJitter zy 
; ; viewport 
field rendering GXSetViewport +) 
GXSetScissor ) : 
scissor 























frame buffer size, format 



















































GXSetDispCopySrc 
GXSetPixelFormat EFB 
AA state, deflicker filter 
GXSetDispCopyYScale 
C GXSetDispCopyDst copy 
. GXCopyDsp 
TV video format aC 
GXSetCopyFilter 





XFB 











XFB position on TV 








C VIConfigure 7 Vi 

















RVL-06-0037-001-A © 2006 Nintendo 
Released: February 27, 2006 CONFIDENTIAL 


Predefined render modes 129 


We define seven basic rendering modes for each of the three main television video modes (NTSC, PAL, M/ 
PAL), making a total of 21 predefined rendering modes. These are described below. 
12.2.1 Double-strike, non-antialiased mode 


In this mode, VI outputs a double-strike, or non-interlaced, signal. This mode turns off antialiasing (AA) to 
speed up fill rate. For NTSC, this mode renders 640x240 lines at 60Hz. 


Figure 64 - Double-strike, non-antialiased mode 
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12.2.2 Double-strike, antialiased mode 

This mode is similar to the preceding one; however, it supports antialiasing (at a potentially reduced fill 
rate). 

12.2.3 Interlaced, non-antialiased, field-rendering mode 


In this mode, VI outputs an interlaced signal, i.e., the rendering alternates between even and odd fields to 
support field-rendering. Antialiasing is turned off for maximum fill rate. For NTSC, this mode renders 
640x240 lines at 60Hz. 


Notes: 
¢ Deflicker is not possible in this mode. 


« The fields must be completed and swapped before vertical retrace, or else the incorrect field is dis- 
played during the next display interval. 


Figure 65 - Interlaced, non-antialiased, field-rendering mode 


Even Fidd 


Odd Field 





12.2.4 Interlaced, antialiased, field-rendering mode 


This is the same as the preceding mode, except that it supports antialiasing (at a potentially reduced fill 
rate). 
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12.2.5 Interlaced, non-antialiased, frame-rendering, deflicker mode 


In this mode, VI outputs an interlaced signal. The entire frame is copied from EFB to XFB with a deflicker- 
ing filter. The video interface hardware can select whether to display the even or odd field within this frame 
buffer. For NTSC, this mode renders 640x480 lines at any frame rate. 


Figure 66 - Interlaced, non-antialiased, frame-rendering, deflicker mode 
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12.2.6 Interlaced, non-antialiased, frame-rendering, non-deflicker mode 


Similar to interlaced, frame-rendering, deflicker mode, except that this mode does not support deflickering. 


12.2.7 Interlaced, antialiased, frame-rendering, deflicker mode 


When rendering a large antialiased frame, the embedded frame buffer is not big enough to hold the entire 
frame, so two rendering passes are necessary to construct a single complete frame buffer. 


Figure 67 - Interlaced, antialiased, frame-rendering, deflicker mode 
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Since the vertical deflicker filter spans three lines, it is necessary to have some overlap in each pass where 
the images will be joined, as illustrated in "Figure 68 - Overlapping copy" on page 131. This diagram shows 
one extra line in each pass for a total of two lines of overlap. Due to the restriction that only even numbers 
of lines can be copied, you must actually have two extra lines from each pass, resulting in a total overlap of 
four lines. 


Note: Copy clamping must be disabled for the bottom of the first pass copy and for the top of the second 
pass copy. 
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Figure 68 - Overlapping copy 
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Drawing the top and bottom halves of the screen correctly involves adjusting the viewing frustum. For 
HW2, you can adjust the scissoring instead (refer to "15 GX updates for HW2" on page 155). See the 
demo frb-aa-full for an example. 
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12.3. GX API default render mode 


GXInit queries VIGetTVFormat to determine the GX API default render mode. The default mode may 
be one of the following, depending on format: 


* GXNtsc480IntDEf. 

: GXPal528IntDf. 

* GXMpal480IntDf. 

These modes feature: 

* Full-screen frame-based rendering. 

« Non-antialiased for the fastest fill rate performance. 
¢ Deflickered display to reduce flickering artifacts. 

« Interlaced display output. 


If DEMOTnit is called with a non-null render-mode pointer, then the referenced render mode is used. If the 
pointer is null, then a default render mode is used. In addition, if a default mode is used, DEMOInit will call 
GXAdjustForOverscan and trim 16 scan lines off the top and bottom of the screen. This adjustment is 
for demonstration purposes only; it is not guaranteed to make the viewport visible on all television sets. 








12.4 Embedded frame buffer formats 


The EFB has a maximum memory capacity of 2027520B = 640 x 528 x (3B(color) + 3B(Z)). The maximum 
pixel width and height of the frame buffer is determined by the size of each pixel. There are two different 
pixel sizes: 


¢ 48-bit color and Z. 
* 96-bit super-sampled color and Z. 


To set these formats, you must call GXSet Pixel Fmt. Setting the pixel format also controls the antialiasing 
mode. 


Note: You must also call GxSetCopyFilter when changing mode. 
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Code 91 - GXSetPixelFormat 


void GxXSetPixelFmt( GXPixelFmt pix fmt, GXZFmtl6 z fmt ); 


Changing pixel formats causes a flush of the rendering pipeline. Also, data existing in the frame buffer is 
not converted when you change formats, so mixed-format rendering is not possible in this manner. As a 
result, it may be necessary to clear the frame buffer again after changing modes. 

12.4.1 48-bit format — non-antialiasing 

The 48-bit format is intended for non-antialiasing; it has the following features: 

¢ 24-bit color (either 8/8/8 with no alpha, or 6/6/6/6 with 6 bits of alpha). 

* 24-bit Z. 

This format can support a maximum resolution of 640x528. The width must be between 0-640 and the EFB 
stride is fixed at 640 pixels. 

12.4.2 96-bit super-sampling format — antialiasing 

The 96-bit pixel format is for antialiasing and has the following features: 

* Samples of 16-bit color (5/6/5, no alpha). 

« Samples of 16-bit Z. 

This format can support a maximum resolution of 640x264. The width must be between 0-640 and the EFB 
stride fixed at 640 pixels. 

12.5 External frame buffer format 

Pixels in the XFB are stored as illustrated below: 


Figure 69 - XFB format in main memory 
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U(i) = 1/4 * U(i-1) + 1/2 * Uli) + 1/4 * U(i+1) 
V(i) = 1/4 * V(i-1) + 1/2 * V(i) + 1/4 * V(i+1) 


In the down-sampling process indicated above, clamping is included for the left and right edges. 
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The following computations illustrate the conversion of RGB to YUV: 
Equation 37 - RGB to YUV conversion 


¥= 0.257*R +0.504*G +0.098*B +16 
U=-0.148*R -0.291*G +0.439*B+128 
V= 0.439*R -0.368*G -0.071*B+128 


Note: The range for Y is only 16 <= Y <= 235. This is in order to meet the requirements of the video 
encoder. 


12.6 CPU direct EFB access 


The embedded frame buffer is memory-mapped within the CPU’s address space. Thus you may directly 
read from and write to the EFB from the CPU. (That being said, the access is not truly direct, since writes 
go through various pixel-processing operations before being written to the EFB, and reads are potentially 
decoded before being presented to the CPU.) 


Because writes become pixel operations, one may configure the CPU-EFB pixel pipeline as follows: 


Code 92 - Functions to configure CPU-EFB accesses 


void GXPokeAlphaMode( GXCompare func, u8 threshold ); 

void GXPokeAlphaRead( GXAlphaReadMode mode ); 

void GXPokeAlphaUpdate( GXBool update enable ); 

void GXPokeBlendMode( GXBlendMode type, GXBlendFactor src factor, 

GXBlendFactor dst_factor, GXLogicOp op ); 

void GXPokeColorUpdate( GXBool update enable ); 

void GXPokeDstAlpha( GXBool enable, u8 alpha ); 

void GXPokeDither( GXBool dither ); 

void GXPokeZMode( GXBool compare enable, GXCompare func, GXBool update enable ); 








Most of these functions correspond to their normal pipeline counterparts (for example, GK PokeZMode 
works just like GXSet ZMode, except that it only affects CPU writes to the Z buffer), so we describe the 
exceptions. 


GXPokeAlphaMode works similarly to GXSetAlphaCompare; however, only a single threshold may be 
used. GXPokeAlphaRead is used to control the alpha value that is read from the EFB when no alpha 
channel is present. 


GX includes these functions for EFB access: 
Code 93 - Functions for CPU-EFB access 


void GXPokeARGB( ul6 x, ul6 y, u32 color ); 
void GXPeekARGB( ul6 x, ul6 y, u32* color ); 
void GXPokeZ( ul6 x, ul6 y, u32 z ); 
void GXPeekZ( ul6 x, ul6 y, u32* z ); 
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Each of these functions results in a 32-bit CPU bus transaction that will write (poke) or read (peek) the 
color or Z value at the specified pixel location. These functions are implemented as follows: 


Code 94 - Implementation of GXPeek and GXPoke 
#define EFB ADDRESS 0x08000000 


void GXPeekARGB(ul6 x, ul6 y, u32* color) 
{ 


u32 addr; 
addr = (u32) OSPhysicalToUncached(EFB ADDRESS) + offset(x, y); 
*kcolor = (*(u32 *)addr); 


} 


void GXPokeARGB(ul6 x, ul6 y, u32 color) 
{ 


mS2 addr; 
addr = (u32) OSPhysicalToUncached(EFB ADDRESS) + offset(x, y); 
* (u32*)addr = color; 


} 


// where offset can be obtained from: 
// offset = ((y)<<12) + ((x)<<2); 


// for 2 access, add (1<<22) to addr 


It is possible to do Z-buffered writes to the color buffer; however, this requires the use of uncached 64-bit 
bus transactions. The best way to do this is by using the locked-down cache mechanism. 


Note: Colors are always in 32-bit format on the CPU side. The GP takes care of converting colors as 
necessary for reading and writing to the EFB. With respect to Z, things are a little bit different. 
When the frame buffer is in 24-bit mode, Z is read and written as a 24-bit integer. When the frame 
buffer is in 16-bit mode, the Z is not converted. The following convenience functions are provided 
to convert between a 24-bit integer Z value and one of the 16-bit Z formats: 


Code 95 - Functions to manipulate 16-bit Z formats 


u32 GXCompressZ16( u32 224, GXZFmt1l6 zfmt ); 
u32 GXDecompressZ16( u32 z16, GXZFmt16 zfmt ); 


Note: Color and Z always require 32 bits (each) to transfer, regardless of the frame-buffer format. 
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13 Graphics FIFO 


13.1 Description 
Figure 70 - GXFifoObj 
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The GX API transmits commands from the CPU to the Graphics Processor (GP) using a GXFifoObj 
structure. The GXFifoObj structure describes a region of main memory, allocated by the application, set 
aside for storing graphics commands. The FIFO can be attached to either the CPU or GP or both. When 
the FIFO is attached to the CPU, GX commands will be written to the FIFO. There is always one—and only 
one—FIFO attached to the CPU. When the FIFO is attached to the GP, the GP will read and process 
graphics commands. Only one FIFO can be attached to the GP ata time. 


The purpose of the FIFO is to allow the CPU and GP to work in parallel at close to their peak rates. There 
are two basic methods of using the FIFO to achieve parallelism: immediate mode and multi-buffer mode. 
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When a single FIFO is attached to both the CPU and the GP, the system is said to be in immediate mode. 
The FIFO read and write pointers are managed by hardware as a true FIFO. As the CPU writes graphics 
commands to the FIFO, the GP will process them in order. The hardware contains special flow control logic 
to prevent writes from over running reads and to wrap the read and write pointers from the last address of 
the buffer back to the first address. GXInit sets up the system to use immediate mode by default. Imme- 
diate mode is generally easier to use, because once it is set up, no further management by the application 
is required. 


Figure 71 - Immediate mode 
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It is also possible to connect one FIFO to the CPU while the GP is reading from a different FIFO. This is 
called multi-buffer mode. |n this case, the FIFOs are managed more like buffers than FIFOs, since there 
are no simultaneous reads and writes to a FIFO. You may choose multi-buffer mode if you require dynamic 
memory management of FIFOs; however, there are complications that make this choice less desirable. 
These will be described below. 


Figure 72 - Multi-buffer mode 
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The CPU always writes graphics commands to the FIFO in 32-byte transfers. To do this, the CPU has a 
special write-gather function that automatically packs graphics commands into 32-byte words. The GP 
always reads graphics commands from the FIFO in 32-byte transfers. 


13.2 Creating a FIFO 


The GX API declares a static GXFifoObj structure internally. This structure is initialized when GxXInit is 
called: 


Code 96 - GXFifoObj 


GXFifoObj* GXInit(void* base, u32 size); 


The FIFO base pointer must be aligned to 32 bytes. The application is responsible for allocating the mem- 
ory for the FIFO. OSAlloc returns 32-byte-aligned pointers. 


The size parameter passed to GXInit is the size of the FIFO in bytes, which must be a multiple of 32. The 
minimum FIFO size is 64KB. GxInit sets up the FIFO for immediate-mode graphics; i.e., both the CPU 
and GP are attached to the FIFO, the read and write pointers are initialized to the base pointer, and the 
high- and low-water marks (see "13.4 FIFO status" on page 139) are enabled. 


GXInit returns a pointer to the initialized GXFif00bj to the application. 
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If the application wants to operate in multi-buffered mode, then it must allocate additional FIFOs. The appli- 
cation must allocate the memory for each additional FIFO and initialize a GxFifoOb j as well. The follow- 
ing functions initialize the GXFifoObj: 


Code 97 - FIFO initialization functions 


void GXInitFifoBase ( 
GXFifoObj* fifo, 
void* base, 
u32 size); 
void GXInitFifoPtrs ( 
GXFifoObj* fifo, 
void* read _ptr, 
void* write ptr ); 
void GXInitFifoLimits ( 
GXFifoObj* fifo, 
u32 hi_ water mark, 





u32 lo_water mark ); 


Normally, the application only needs to initialize the FIFO read and write pointers to the base address of 
the FIFO. Once initialized, the system hardware will control the read and write pointers automatically. 


The application only needs to call GkInitFifoLimits when the FIFO will be used in immediate mode. 
This function sets the high and low water marks for the FIFO, which are not available in multi-buffer mode 
(see "13.5 FIFO flow control" on page 140). 


Note: These APIs are intended for use on FIFOs that are not attached to the CPU or GP. This is to pre- 
vent any temporary inconsistencies in the pointers and water mark values. These APIs will cause 
assertion failures if they are used on attached FIFOs. 


The following inquiry functions may be used to retrieve the data set above. To get the current FIFO point- 
ers, refer to "13.4 FIFO status" on page 139. 


Code 98 - FIFO basic inquiry functions 


void* GXGetFifoBase( GXFifoObj* fifo); 
u32 GXGetFifoSize( GXFifoObj* fifo); 
void GXGetFifoLimits( GXFifoObj* fifo, u32* hi, u32* lo); 


13.3 Attaching and saving FIFOs 


Once a FIFO has been initialized, it can be attached to the CPU or the GP or both. Only one FIFO may be 
attached to either the CPU or GP at the same time. Once a FIFO is attached to the CPU, the CPU may 
issue GX commands to the FIFO. When a FIFO is attached to the GP, it will be enabled to read graphics 
commands from the FIFO. The following functions attach FIFOs: 


Code 99 - FIFO attachment functions 


void GXSetCPUFifo( GXFifoObj* fifo ); 
void GXSetGPFifo( GXFifoObj* fifo ); 


You may inquire which FIFO objects are currently attached with these functions: 


Code 100 - FIFO attachment inquiry functions 


GXFifoObj* GXGetCPUFifo( void ); 
GXFifoObj* GXGetGPFifo( void ); 
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In multi-buffer mode, when the CPU is finished writing GX commands, the FIFO should be “saved” before 
switching to a new FIFO. The following function saves the CPU FIFO: 


Code 101 - GXSaveCPUFifo 


void GXSaveCPUFifo( GXFifoObj* fifo ); 


When a FIFO is saved, the write-gather buffer is flushed to make sure all graphics commands are written 
to main memory. In addition, the current FIFO read and write pointers are stored in the GXFifoObj struc- 
ture. 


Note: There is no save function for the GP. Once the GP is attached, graphics commands will continue to 
be read until either: 


* The FIFO is empty. 
* A FIFO breakpoint is encountered (see "13.6.3 FIFO breakpoint" on page 142). 


* The GP is preempted using GXAbortFrame (see "14 Performance metrics" on page 147). 


13.4 FIFO status 
You can use the following functions to read the status of a FIFO and the GP: 


Code 102 - FIFO status functions 


void GXGetFifoStatus ( 
GXFifoObj* fifo, 


GXBool* overhi, 
GXBool* underlo, 
u32* fifo_cnt, 
GXBool* cpu_write, 
GXBool* gp_read, 
GXBool* fifowrap ); 


void GXGetGPStatus ( 
GXBool* overhi, 





GXBool* underlow, 
GXBool* readIdle, 
GXBool* cmdIdle, 
GXBool* brkpt ); 


void GxXGetFifoPtrs ( 
GXFifoObj* fifo, 
void** readPtr, 
void** writePtr ); 


Use GxXGetFifoStatus to get the status of a specific FIFO. If the FIFO is currently attached to the CPU, 
the parameter cpu_write will be GX_TRUE. When the FIFO is currently attached to the GP, the parameter 
gp_read will be GX_TRUE. When a FIFO is attached to either the CPU or the GP, the status will be read 
directly from the hardware’s state. If the FIFO is not attached, the status will be read from the GXFifo0bj. 
GXGetFifoStatus reports whether the specified FIFO is over its high water mark (overhi) or below its 
low water mark (underlo). 








If the FIFO is currently attached to the CPU, executing GxGet FifoStatus causes a flush before getting 
the status. This will cause 32 bytes of NOPs to be written to the FIFO. As a result, you should not call 
GXGetFifoStatus ona very frequent basis. 


© 2006 Nintendo RVL-06-0037-001-A 
CONFIDENTIAL Released: February 27, 2006 


140 Graphics Library (GX) 


In general, the hardware cannot detect when a FIFO overflows; i.e., when the amount of data exceeds the 
size of the FIFO. Thus, the value of fifo_cnt (number of 32-byte blocks in the FIFO) may not necessarily be 
accurate if an overflow has occurred. 


Although there is no general way to detect FIFO overflows, the hardware can detect when the CPU write 
pointer reaches the top of the FIFO. If this condition has occurred, the fifowrap argument will return 
GX_TRUE. Thus, you can use the fifowrap argument to detect FIFO overflows if you ensure that the CPU’s 
write pointer is always initialized to the base of the FIFO. The fifowrap argument is set only if the FIFO is 
currently attached to the CPU. 





Use GXGetGPStatus to get the status of the GP (regardless of the FIFO that is attached to it). The mini- 
mum requirement before attaching a new GP FIFO is to wait for the readidle status to be GX_ TRUE. Nor- 
mally, additional requirements would include making sure that all graphics commands have been rendered 
into the EFB, and that the EFB has been copied to main memory. The cmdidie status provides the addi- 
tional information that the command processor is idle. 





The parameters underlow and overhi indicate where the write pointer is, relative to the high and low water 
marks. They do not indicate any error in processing. 


GXGetFifoPtrs may be used to request the read and write pointers of the given FIFO. If the given FIFO 
is attached to either the CPU or the GP, the appropriate hardware registers are read in order to provide the 
correct information. 


13.5 FIFO flow control 


When a FIFO is attached to both the CPU and GP (immediate mode), care must be taken so that the CPU 
stops writing commands when the FIFO is too full. The high water mark defines how full the FIFO can get 
before graphics commands will no longer be written to it. Since there may be up to 16KB of buffered graph- 
ics commands in the CPU, we recommend that you set the high water mark to the FIFO size less 16KB. 
(This 16KB figure comes into play if the locked-cache mechanism is used to write to a FIFO.) 


When the high water mark is encountered, the program will be suspended, but other interrupt-driven tasks 
such as audio will continue. 


Note: Ina multi-threaded program, the library will have to choose a particular thread to suspend. By 
default, the thread that called GxInit is suspended (which, in a single-threaded application, 
would be the main loop). However, you may designate a different thread to be suspended with the 
following APIs: 


Code 103 - APIs to get and set the current GX thread 


OSThread* GXGetCurrentGxXThread ( void ); 
OSThread* GXSetCurrentGxXThread ( void ); 


GXSetCurrentGxThread will designate the calling thread as the current GX thread and return a pointer 
to the previous GX thread. GxGetCurrentGxXThread will return a pointer to the current GX thread. 


Note: Itis a programming error to call G<¥SetCurrentGXThread while the previous GX thread is sus- 
pended waiting for a low water mark. This condition indicates that, potentially, your program has 
two threads generating GX data. An assertion failure will occur in this situation. 


The low water mark defines how empty the FIFO must become after reaching a high water mark before the 
program (or GX thread) is allowed to continue. We recommend that the low water mark be set to (FIFO 
size / 2). The low water mark prevents frequent context switching in the program, since it does not need to 
poll some register or constantly receive overflow interrupts when the amount of new command data stays 
close to the high water mark. 
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When in multi-buffered mode, the high and low water marks are disabled. When a FIFO is attached to the 
CPU, and the CPU writes more commands than the FIFO will hold, the write pointer will be wrapped from 
the last address back to the base address. Previous graphics commands in the FIFO will be overwritten. It 
is possible to detect only when the write pointer wraps over the top of the FIFO (which indicates an over- 
flow only if the FIFO’s write pointer was initialized to the base of the FIFO before commands were sent). 
See GXGetFifoStatus for more information. 


In order to prevent FIFO (buffer) overflow in multi-buffered mode, the application must use a software- 
based checking scheme. The program should keep its own counter of the buffer size, and before any 
group of commands is added to the buffer, the program should check and see if there is room. If room is 
available, the size of the group should be added to the buffer size. If room is not available, the buffer 
should be flushed and a new one allocated. 


Instead of using multi-buffered mode, it may be preferable to use a single large FIFO along with break- 
points to simulate multi-buffering. Breakpoints are discussed in the next section. 


13.6 Draw synchronization functions 


The rendering pipeline consists of several asynchronous components. Among them are the CPU generat- 
ing graphics commands, the GP consuming the commands and producing frame buffers, and the VI dis- 
playing the frame buffers. We provide several mechanisms to synchronize these components, allowing for 
various programming models (with different levels of complexity). 


The CPU must be coordinated with the GP since not all of the graphics data goes through the FIFO. All the 
indexed data and texture data that the CPU provides must remain available until the GP has read it, after 
which it can be altered for the next frame or deleted as necessary. The GP must be coordinated with the VI 
so that the EFB is copied only to an inactive external frame buffer XFB, and so that VI will switch to scan- 
ning out the new XFB at the right time, freeing up the previously scanned-out XFB. 


First, we describe the mechanisms that are available for synchronization. Then we show how to use the 
different mechanisms for various synchronization schemes with varying levels of complexity and efficiency. 


13.6.1 GXDrawDone 


We have mentioned GXDrawDone briefly already. G<DrawDone is actually a wrapper around two synchro- 
nization functions: GXSetDrawDone and GXWaitDrawDone. The former sends a draw-done token into 
the FIFO and flushes it, while the latter waits for the pipeline to flush and the token to appear at the bottom 
of the pipe. Instead of waiting for the token, you can also make use of a callback that occurs as a result of 
a draw-done interrupt. This callback runs with interrupts disabled, and thus must complete quickly. The 
function to set the callback routine also returns the old callback function pointer. The draw-done functions 
are summarized below: 


Code 104 - GXDrawDone synchronization commands 


void GXDrawDone(); 

void GXSetDrawDone(); 

void GxXWaitDrawDone(); 

typedef void (*GXDrawDoneCallback) (void) ; 

GXDrawDoneCallback GXSetDrawDoneCallback (GXDrawDoneCallback cb); 


13.6.2 GXDrawSync 


In order to detect that the pipeline has completely rendered geometry, you can use the functions 
GXSetDrawSync and GXReadDrawSync. Use GXSetDrawSync to send a token (a 16-bit number of your 
choosing) down the pipeline after rendering the geometry. This token will be stored in a special token reg- 
ister when it reaches the bottom of the pipeline. Use GkxReadDrawSync to poll the token register. When 
the token register value returned matches the token you sent, the geometry has been rendered completely. 
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It is also possible to receive an interrupt when the draw token reaches the bottom of the pipeline. The 
application can register a callback, using GXSetDrawSyncCallback, that will be called by the interrupt 
handler. The callback’s argument is the value of the most-recently-encountered token. Since it is possible 
to miss tokens (because graphics processing does not stop while the callback is running), your code 
should be capable of deducing if any tokens have been missed (e.g., by using monotonically increasing 
values). 


The draw-sync mechanism is similar to the draw-done mechanism, with two major differences. First, draw- 
sync allows you to insert a numbered (16-bit) token into the pipe and read the token value when it reaches 
the pipe bottom. Second, draw-sync does not force the pipe to be flushed, and thus does not create a 
“bubble” of idle cycles within the pipe. The draw-sync functions are summarized below: 


Code 105 - GXDrawSync synchronization commands 


void GXSetDrawSync(ul6 token); 

ul6 GXReadDrawSync(); 

typedef void (*GXDrawSyncCallback) (ul6 token); 
GXDrawSyncCallback GXSetDrawSyncCallback(GxXDrawSyncCallback cb); 


13.6.3 FIFO breakpoint 


Sometimes it is useful to write two or more frames of graphics to the same FIFO. The breakpoint feature 
will cause GP FIFO reads to be disabled when the FIFO read pointer matches the breakpoint value. The 
breakpoint can be set using: 


Code 106 - GXEnableBreakPt 


void GxXEnableBreakPt( void* break pt ); 


You can re-enable GP FIFO reads by using: 
Code 107 - GXDisableBreakPt 


void GXDisableBreakPt( void ); 


For example, after writing frame A of graphics, read the current CPU FIFO write pointer using 
GXGetFifoPtrs. Make this the current breakpoint using GXxEnableBreakPt. Continue writing frame B 
of graphics into the FIFO. GP FIFO reads will be disabled when the read pointer reaches the break point. 
The read_idle status can be polled for this event. The application can enable processing of frame B graph- 
ics by calling GXDisableBreakPt. 





There is also a CPU interrupt that is associated with the break point. You can define a callback to be exe- 
cuted when this interrupt occurs. This callback will run with interrupts disabled, and thus it is necessary for 
the callback to run as quickly as possible. The callback can be set using GxSetBreakPtCallback, which 
also returns the previous callback: 


Code 108 - GXSetBreakPtCallback 
typedef void (*GXBreakPtCallback) (void); 


GXBreakPtCallback GXSetBreakPtCallback( GXBreakPtCallback cb ); 


For an example of how to use the same FIFO to manage two frames of graphics with breakpoints, take a 
look at mgt-fifo-brkpt in the gxdemos/Management branch of the source tree. 
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13.6.4 Abort frame 


GX allows you to halt the GP and flush all commands currently in the FIFO up to the next break point or the 
end of the FIFO (if no break point is set). The following command achieves this: 


Code 109 - GXAbortFrame 


void GXAbortFrame( void ); 


This command resets all state in the GP. Textures loaded in texture memory are retained, but if a load was 
in progress, you must make sure it was not aborted (you should use draw syncs to verify this). When start- 
ing the next frame, you should send down new, complete state information (you should not assume that 
any state has been retained from the aborted frame). 


13.6.5 Vi synchronization 


The preceding functions synchronize the CPU with the GP. In order to synchronize the CPU with the video 
output, you may use the following functions: 


Code 110 - VI synchronization commands 


void VIWaitForRetrace(); 

typedef void (*VIRetraceCallback) (u32 retraceCount) ; 
VIRetraceCallback VISetPreRetraceCallback(VIRetraceCallback cb); 
VIRetraceCallback VISetPostRetraceCallback (VIRetraceCallback cb); 


The VIWaitForRetrace function suspends the current thread until a vertical retrace occurs. When the 
retrace does occur, an interrupt is sent to the CPU. The handler for this interrupt will first call the “pre” 
retrace callback. It will next update the VI hardware registers, and then it calls the “post” retrace callback. 
Finally, it wakes any threads that are waiting for retrace. The callback routines run with interrupts off, and 
therefore must complete quickly. The functions to set the callback routines also return the old callback 
function pointer. For more details on these functions, refer to the Video Interface Library (VI) section. 


13.7 Draw synchronization methods 


13.7.1 Double-buffering 


The simplest programming model uses double-buffering and G<DrawDone to coordinate the CPU with the 
GP. Having the EFB and one XFB would seem to be enough to provide for double-buffering; however, due 
to synchronization issues, it is simpler to have two XFBs. This allows the copy operation to be performed 
after graphics rendering is finished without having to stall until vertical retrace occurs. The following end-of- 
frame code sequence illustrates simple double-XFB synchronization: 


Code 111 - Double-buffer copy synchronization 


// ... Avraw the image in the EFB, then: 

GXCopyDisp(xfb, GX_TRUE) ; 

GxXDrawDone() ; 

VISetNextFrameBuffer (xfb) ; 

VIFlush(); 

VIWaitForRetrace(); 

xfb = (xfb == xfbl) ? xfb2 : xfbl; 
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With only one XFB, the copy operation must be performed during vertical retrace. The copy operation can 
be completed in about 0.5 milliseconds, thus finishing before the next video field begins to display. Since 
the copy command must be issued through the FIFO, you must hold up the FIFO until vertical retrace 
occurs. In addition, the copy command must be guaranteed to issue immediately upon vertical retrace. 
The latter requirement means that the copy command must be issued during a vertical retrace interrupt 
callback. The following code illustrates this method: 


Code 112 - Single-buffer copy synchronization 


// main loop: // vertical retrace interrupt “post” callback: 
// ... draw the image in the EFB, then: if (do_copy) { 

GXDrawDone(); GXCopyDisp(xfb, GX_TRUE) ; 

do_copy = GX_TRUE; GXFlush () ; 

ViIWaitForRetrace(); do_copy = GX_FALSE; 


13.7.2  Triple-buffering 


Both of the above methods idle the CPU and the GP while waiting for vertical retrace. You can avoid this 
idling by using triple-buffering. This allows the graphics pipeline to run without having to wait for video 
refresh. You can perform triple-buffering in various ways. We will describe a method that uses only two 
XFBs. 


When you decide not to wait for draw-done and vertical retrace before continuing drawing, various compli- 
cations arise. One is that indexed data and dynamic texture data must remain available until you know the 
GP is done with it. Another is that you may finish drawing two frames before VI has scanned out only one. 
In this case, the second frame will have nowhere to go since both XFBs will be occupied. The second copy 
must wait until the vertical retrace period. Fortunately, these problems can be solved by use of the syn- 
chronization commands provided. 


The breakpoint must be used in order to hold back the frame buffer copy commands. The draw sync token 
must be used to detect when the copy commands have completed, and the vertical retrace callbacks must 
be used to coordinate the copies and buffer swaps. To see all of this in action, refer to the demo mgt- 
triple-buf.c located within the gxdemos/Management branch of the source tree. 


13.8 Graphics FIFO vs. display list 


Writing graphics commands to a command FIFO differs from writing graphics commands to a display list in 
several respects. When writing commands to a FIFO you may use GXCallDisplaylList to call a display 
list. Display lists (bracketed by GXxBeginDisplayList/GXEndDisplayList) may not themselves call a 
display list. 








Using GXSetGPFi fo, the application attaches a FIFO to the GP to enable processing of the FIFO’s graph- 
ics commands. A display list is called using GXCallDisplayList. A FIFO may be attached to the CPU 
and the GP simultaneously. The CPU creates a display list, a call command is issued into a command 
FIFO, and the GP reads and processes the display list. 


13.9 Notes about the write-gather pipe 


The CPU write-gather pipe is a mechanism for doing fast uncached writes to main memory. It consists of a 
128-byte circular queue organized as four 32-byte cache lines. It collects together the writes to a single 
memory address and stores them in the queue. When a cache line is filled, it is scheduled to be written to 
memory while another cache line continues to absorb the writes. When combined with the graphics pro- 
cessor’s FIFO mechanism, this allows fast writing to arbitrary areas of memory, provided that all the writes 
are 32-byte aligned. 
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Flushing the write-gather/FIFO mechanism is typically done in GX by writing 32 NOPs. In general, these 
NOPs are not cleared out whenever the write-gather is redirected (i.e., when using 
GXBeginDisplayList or GXEndDisplayList). Therefore, a given stream (display list) may include up 
to 31 extra NOPs at the start and up to 32 extra NOPs at the end. 





Since the write-gather pipe is a handy way to blast data into memory, a couple of extra APIs have been 
created to make this easy: 


Code 113 - APIs to control the write-gather pipe 


volatile void* GXRedirectWriteGatherPipe ( void * ptr ); 
void GXRestoreWriteGatherPipe (); 


The first API allows the write-gather pipe to be redirected to an arbitrary, 32-byte aligned address. It returns 
a pointer to the write-gather register to which the application should actually write the data. The second 
API restores the write-gather pipe to where it had been before it was redirected. 


These APIs handle flushing differently than the rest of GX. They will clear out any extra zeros that were 
used to flush the write-gather pipe. Also, the restore function flushes by writing only 31 zeros, thus avoid- 
ing the possibility of writing out one too many cache lines. 


Note: You cannot issue most GX commands while the write-gather pipe has been redirected. 


13.10 GX verify 


The debug version of the GX library has a verify feature which can be used to check for certain state-set- 
ting errors. This checking happens when GxBegin is called. The following APIs control this feature: 


Code 114 - APIs to control verification 


typedef enum { 


GX_WARN_ NONE, // no warnings reported 
GX_WARN_SEVERE, // veports only severest warnings 
GX_WARN_ MEDIUM, // veports severe and medium warnings 
GX_WARN_ ALL // veports any and all warning info 

} GXWarningLevel; 


void GXSetVerifyLevel( GXVerifyLevel level ); 


typedef void (*GXVerifyCallback) (GXVerifyLevel level, 
u32 id, 
char* msg) ; 


GXVerifyCallback GXSetVerifyCallback( GXVerifyCallback cb ); 


As you can see, you can control the level of checking that occurs. The higher the verification /evel is set, 
the longer it takes. 


Note: Even setting the highest level of verification will not uncover all possible errors; there are still many 
kinds of errors for which the verification function does not check. 


When the verification function finds an error, it calls the GX verification callback. There is a default callback 
function that simply prints the /evel, id, and msg out to the debug console. You may change the error- 
reporting behavior by substituting your own callback function using GXSetVerifyCallback. 
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14 Performance metrics 


Application developers can access internal performance counters in the Graphics Processor. Statistics 
gathered using the performance counters might be useful in tuning the application for the highest perfor- 
mance, or for making load-balancing decisions at runtime. 

14.1 Types of metrics 

The GP metrics are grouped into different categories: 

¢ GP front-end and texture-related metrics. 

* Vertex cache metrics. 

« Pixel metrics. 

* Memory metrics. 


A set of GX functions exists to handle each category of metrics. The functions themselves fall into three 
categories: 


* Set functions choose exactly which metric to count. 

* Read functions read the value of the counter. 

« Clear functions clear the counter back to zero. 

The pixel and memory metrics do not have any “set” functions since all of the available counters may be 
read at once. 

14.2 GP front-end and texture-related metrics 

The following functions are used to control the performance counters for various GP-related events: 


Code 115 - GP metric functions 


void GXSetGPMetric( GXPerf0 perf0, GXPerfl perfl ); 
void GXReadGPMetric( u32* cnt0, u32* cntl ); 
void GXClearGPMetric( void ); 


// macros to deal with counter 0 only: 
void GXSetGPOMetric( GXPerf0O perfO ); 
u32 GXReadGPOMetric( void ); 

void GXClearGPOMetric( void ); 


// macros to deal with counter 1 only: 
void GXSetGP1lMetric( GXPerfl perfl ); 
u32 GXReadGPlMetric( void ); 

void GxXClearGPlMetric( void ); 





Note: There are two counters which are set, read, and cleared at the same time. The functions that deal 
with counters 0 or 1 are generally macros that set the desired counter and turn off the other one. 
You cannot clear one counter without clearing the other at the same time. 


The subsequent sections of this chapter describe the various counters. 


14.2.1 GP Counter 0 details 
GX_PERFO_VERTICES 


This metric returns the number of vertices processed by the GP as measured by the transform engine 
(XF unit). 
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GX_PERFO_CLIP_VTX 

Returns the number of vertices that were clipped by the GP. 
GX_PERF0O_CLIP_CLKS 

Returns the number of GP clocks spent clipping. 


The transform engine (XF) in the GP is a pipeline that has an input stage, parallel transform and light- 
ing stages, and a “bottom of pipe” processor which merges the results of lighting and texture coordi- 
nate generation. The following performance counters measure how many GP cycles are spent in each 
stage of the XF. 


GX_PERF0O_XF_WAIT_IN 


Measures by how many cycles the XF has been waiting on input. If the XF is waiting a large percent- 
age of the total time, it may indicate that the CPU is not supplying data fast enough to keep the GP 
busy. 


GX_PERF0O_XF_WAIT_OUT 


Measures by how many cycles the XF waits to send its output to the rest of the GP pipeline. If the XF 
cannot output, it may indicate that the GP is currently fill-rate limited. 


GX_PERFO_XF_XFRM_CLKS 

Indicates the number of cycles that the transform engine is busy. 
GX_PERF0O_XF_LIT_CLKS 

Indicates the number of cycles that the lighting engine is busy. 
GX_PERF0O_XF_BOT_CLKS 

Indicates the number of cycles that the bottom of the pipe is busy. 


The XF contains various state registers that control its processing. The registers are normally set using various 
functions of the GX API. The following counters measure state-register accesses. 


GX_PERF0O_XF_REGLD_CLKS 
Measures how many cycles are spent loading (writing to) XF registers. 
GX_PERF0O_XF_REGRD_CLKS 
Measures how many cycles are spent reading XF state registers. 
GX_PERF0_TRIANGLES* 
The triangle metrics allow the counting of triangles under specific conditions or with specific attributes. 
* GX_PERFO_TRIANGLES counts all triangles. 
« GX_PERFO_TRIANGLES_ CULLED counts triangles that failed the front/backface culling test. 
« GX_PERFO_TRIANGLES PASSED counts triangles that passed the front/backface culling test. 
« GX_PERFO_TRIANGLES SCISSORED counts the triangles that are scissored. 


« GX_PERFO_TRIANGLES *TEX count triangles based on the number of texture coordinates sup- 
plied. 


« GX_PERFO_TRIANGLES *CLR count triangles based on the number of colors supplied. 
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GX_PERF0_QUAD* 


The quad metrics allow you to count the number of quads (2x2 pixels) the GP processes. The term 
coverage is used to indicate how many pixels in the quad are actually part of the triangle being raster- 
ized. For example, a coverage of 4 means that all pixels in the quad intersect the triangle. A coverage 
of 1 indicates that only one pixel in the quad intersects the triangle. 


« GX_PERFO_QUAD_OCVG indicates the number of quads having 0 coverage. 
« GX_PERFO_NONOCVG counts the number of quads that have greater than zero coverage values. 
* GX_PERFO_QUAD [1-4]CVG counts the quads having the given coverage. 


« GX_PERFO_AVG QUAD _ CNT indicates the average quad count (number of pixels covered 
divided by 4). 


GX_PERF0O_CLOCKS 


GX_PERF0O CLOCKS counts the number of GP clocks that have elapsed since the previous call to 
GXReadGPOMetric. 


GX_PERFO_NONE 


This metric disables counting on GP counter 0 and clears the current count. 











14.2.2 Counter 1 details 
GX_PERF1_TEXELS 

This metric returns the number of texels processed by the GP. 
GX_PERF1_TX_IDLE 

Returns the number of clocks that the texture unit (TX) is idle. 
GX_PERF1_TX_REGS 

Returns the number of GP clocks spent writing to state registers in the TX unit. 
GX_PERF1_TX_MEMSTALL 

Returns the number of GP clocks the TX unit is stalled waiting for main memory. 


GX_PERF1_TC_CHECK1_2 
GX_PERF1_TC_CHECK3_4 
GX_PERF1_TC_CHECK5 6 
GX_PERF1_TC_CHECK7_8 
GX_PERF1_TC_MISS 


These metrics can be used to compute the texture cache (TC) miss rate. The TC_CHECK* parame- 
ters count how many texture cache lines are accessed for each pixel. In the worst case, for a mipmap, 
up to 8 cache lines may be accessed to produce one textured pixel. GX_PERF1_TC_MISS counts 
how many of those accesses missed the texture cache. To compute the miss rate, calculate: 


Equation 38 - Miss rate calculation 
GX_PERF1_TC MISS 


GX TC PERF1 TC CHECK1 2+GX PERF1 TC CHECK3 4+GX PERF1 TC CHECK 5 6+GX PERF1 TC CHECK 7 8 
GX_PERF1_VC_ELEMQ_FULL 
Counts vertex cache stalls due to its element queue being full. 
GX_PERF1_VC_MISSQ 


Counts vertex cache stalls due to its miss queue being full. 
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GX_PERF1_VC_MEMREQ_FULL 
Counts vertex cache stalls due to too many outstanding main memory requests. 
GX_PERF1_VC_STATUS7 


Counts vertex cache stalls due to too many elements in the element queue depending upon a single 
cache line. 


GX_PERF1_VC_MISSREP_FULL 
Counts vertex cache stalls due to a cache miss with all sets still in use (no replacement available). 
GX_PERF1_VC_STREAMBUF_LOW 


Counts vertex cache stalls due to the near-empty FIFO (streaming buffer) having priority over the ver- 
tex cache. 


GX_PERF1_VC_ALL_ STALLS 

Counts all of the above-mentioned vertex cache stall conditions. 
GX_PERF1_VERTICES 

This metric returns the number of vertices processed by the GP as measured by the vertex cache. 
GX_PERF1_FIFO_REQ 

This metric counts the number of lines (32B) read from the GP FIFO. 
GX_PERF1_CALL_REQ 

This metric counts the number of lines (32B) read from called display lists (GXCallDisplayList). 
GX_PERF1_VC_MISS_REQ 


This metric counts the number of vertex cache miss requests. Each miss requests a 32B transfer from 
main memory. 


GX_PERF1_CP_ALL_REQ 


This metric counts all requests (32B/request) from the GP command processor (CP). It should be 
equal to the sum of counts returned by GX_PERF1_FIFO_REQ, GX_PERF1_CALL_REQ, and 
GX_PERF1_VC_MISS_REQ. 


GX_PERF1_CLOCKS 


GX_PERF1_CLOCKS counts the number of GP clocks that have elapsed since the previous call to 
GXReadGP1Metric. 


GX_PERF1_NONE 


This metric disables counting on GP counter 1 and clears the current count. 
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14.3. Using performance counters 


The performance counter functions directly access GP registers, and thus they only work in immediate 
mode. However, they measure information which is sent through the GP FIFO, and so some synchroniza- 
tion is necessary in order to make sure the desired data is measured properly. The following code 
sequence illustrates one possible approach. 


Code 116 - Counting a metric 
u32 metricl, metric2; 


// Set desired metrics 
GXSetGPMetric (GX_PERFO_ VERTICES, GX_PERF1 TEXELS) : 


// Clear counters 
GXClearGPMetric(); 


// Send down non-end draw-sync token 
GXSetDrawSync (0x0); 


// Draw Object (s) 


// Send down ending draw-sync token, wait for it 
GXSetDrawSync (0Oxbeef) ; 
while (Oxbeef != GXReadDrawSync() ) 


r 


// Read the counters 
GXReadGPMetric(&metricl, &metric2); 
OSReport ("Number of verts: %d texels: %d\n", metricl, metric2); 


14.4 Vertex cache metrics 
Use the following functions to control the performance counters for vertex cache-related events: 


Code 117 - Vertex cache metric functions 


void GXSetVCacheMetric( GXVCachePerf attr ); 
void GXReadVCacheMetric( u32* check, u32* miss, u32* stall ); 
void GxXClearVCacheMetric( void ); 


The “set” function allows you to choose which vertex attribute will be measured. You can choose a value of 
GX _VC_ALL in order to measure all of the attributes at once. For any given attribute selection, three met- 
rics are available: 


¢ check indicates the total number of accesses to the vertex cache. 
* miss indicates the total number of misses when accessing the vertex cache. 
¢ — stall indicates the number of GP clocks the GP is stalled waiting on the vertex cache. 


The stall count measures how often the command processor of the GP must wait on cache requests being 
filled. 
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14.5 Pixel metrics 
The following functions are used to control the performance counters for pixel-related events: 


Code 118 - Pixel metric functions 


void GXReadPixMetric( u32* top pixels in, 
u32* top pixels out, 
u32* bot_pixels in, 
u32* bot pixels out, 
u32* clr pixels in, 
u32* copy _clks ); 
void GXClearPixMetric( void ); 


The GP can be configured to Z-buffer before or after texture lookup (see GXSetZCompLoc). The parame- 
ter top_pixels_in returns the number of pixels entering the Z compare before texture lookup. The parame- 
ter top_pixels_out indicates how many pixels passed this Z compare test. 


The parameter bot_pixels_in counts the number of pixels entering the Z compare after texture lookup. The 
parameter bot_pixels_out indicates how many pixels passed this Z compare test. 


The parameter cir_pixels_in counts the number of pixels processed by the blend unit in the last stage of 
the pipeline. This is normally the sum of top_pixels_out and bot_pixels_out. 


The parameter copy_ciks counts the number of GP clocks spent on copy operations, either from the EFB 
to a texture (see GXCopyTex), or from the EFB to a display buffer (see GXCopyDisp). 

14.6 Memory metrics 

The following functions are used to control the performance counters for memory-related events: 


Code 119 - Memory metric functions 


void GxXReadMemMetric( u32* cp_req, 
u32* tc_req, 
u32* cpu_rd_req, 
u32* cpu_wr_req, 
u32* dsp_req, 
u32* io_req, 
u32* vi_req, 
u32* pe_req, 
u32* rf_req, 
u32* fi_req ); 





void GXClearMemMetric( void ); 
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The various metrics are explained in the following table: 


Table 15 - Memory metrics 


Metric Purpose 
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cp_req The command processor (CP) is responsible for reading the Graphics FIFO, read- 
ing display lists (GXCallDisplayList), and servicing vertex cache misses. This 
metric returns the number of memory read requests issued by the CP. 

tc_req Returns the number of memory read requests issued by the Texture Cache (TC). 

Cpu_rd_req Returns the number of memory read requests made by the CPU. 

cp_wr_req Returns the number of memory write requests made by the CPU. 

Dsp_req Returns the number of memory requests made by the Audio DSP. 

io_req Returns the number of memory requests made by IO devices. 

vi_req Returns the number of memory read requests made by the Video Interface (VI). 

pe_req Returns the number of memory write requests made by the Pixel Engine (PE). 
These include texture copies (GXCopyTex) and display copies (GXCopyDisp). 

rf_req Returns the number of memory refresh requests. 

fi_req Returns the number of Forced Idle (Fl) requests, which are dummy requests 
required to switch the bus direction (i.e., read to write, or write to read). 
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15 GX updates for HW2 


In this chapter, we describe the changes and additions to the GX API that are available on HW2 systems 
(i.e., those systems that feature the second generation or later Graphics Processor). For the most part, GX 
for HW2 is backward compatible with GX for HW1. There are a couple of exceptions which we will describe 
first, followed by a brief discussion of hardware bugs (i.e., HW1 bugs fixed, and HW2 bugs remaining). 
Finally, we describe the new capabilities of HW2 and how they are implemented in GX. 


15.1 Compatibility 

In general, GX for HW2 is backward compatible with GX for HW1. There are two exceptions: 
* GXSetTevClampMode is gone. 

* GXSetTevColorIn does not handle texture component swap. 


GXSetTevClampMode is gone because the non-linear TEV clamp functionality has been removed from 
the HW2 GP (the linear clamp mode still exists as the only possible clamp mode). New TEV compare func- 
tionality has been implemented in HW2 to replace the missing clamp modes; this is described in "15.3 New 
HW2 features" on page 155. 


The HW2 TEV has more flexible options for swapping the texture and raster color components. As a result, 
we added new API functions to control the component swapping. Due to collisions between the new API 
and the texture component swapping that may be done with GXSetTevColorIn, the HW2 version of 
GXSetTevColorIn does not handle any texture component swapping. Consequently, you cannot select 
argument values of GX_CC_TEXRRR, GX_CC_TEXGGG, or GX_CC_TEXBBB. In addition, selecting 
GX_CC_TEXC will not turn off texture component swapping. 














Since GXSetTevOp calls GXSetTevColortIn, it may also behave differently under HW2. With HW2, you 
cannot use GXSetTevOp in order to cancel a texture component swap (this example works on HW'1). 


If you desire the HW1 functionality of G<Set TevColorIn under both HW1 and HW2 platforms, you may 
use the function DEMOSetTevColoriIn (from the DEMO library). Similarly, if you need GXSetTevOp to be 
able to affect texture component swap (under both platforms), you should use DEMOSetTevoOp. 








15.2 Bugs 


All bugs are noted in the "HW2 Errata" section in this guide. 


15.3 New HW2 features 

The HW2 version of GX has the following added features: 

« NBT indices can be separated. 

* Fractional shift works with 8-bit vertex attributes. 

« Renormalization and “post-transform” matrices added for texgens. 
« Line (but not point) aspect ratio fixed for field-mode rendering. 

« New TEV compare functions added. 

¢ More flexibility in TEV for texture and raster color component swaps. 
« New TEV “constant” color registers, component selectable. 

* Subtractive “blend” mode. 

« New texture copy types (for both color and Z). 


¢« Scissor box offset. 
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The rest of this chapter describes these features in more detail. 


15.3.1 NBT indices can be separated 


In HW1, an indexed NBT attribute could be specified only by a single index pointing to a triple of NBT val- 
ues. Due to the large number of possible NBT triples, indexed NBT normals were not very useful. 


These indices can be separated on HW2. In addition, there is an implied offset for the different normals 
that make up the triple. For separated indices, the attribute address is computed as follows: 


Equation 39 - Attribute address for separated NBT indices 
address = array_base + index * array_stride + (3 * NBT_offset + XYZ_component) * component_size 
Where NBT_ offset is N=0,B=1, T=2 
and XYZ_componentis X=0, Y=1,Z=2 


The application specifies separated NBT indices by specifying a component count of GX_NRM_ NBT3 in 
GXSetVtxAttrFmt. When matched with a vertex descriptor that specifies indexed normals, this combina- 
tion specifies that three separate indices will be provided instead of just one. 


There are two possible ways to set up a normal table for NBT normals. One is to set up three interleaved 
tables. This method is straightforward, since the built-in NBT offset takes care of choosing the right value 
from the interleaved tables. Another method is to set up a single table (after all, a normal is a normal 
regardless of the name). In order to implement this method, you must take into account the NBT offsets 
and adjust the indices used in order to cancel out the offsets. Thus: 


Equation 40 - Index adjustments for NBT offsets 
index for N = index in table 
index for B = ((index in table -1) + limit) % limit 
index for T = ((index in table -2) + limit) % limit 


Where limit = 256 for 8-bit indices, or 65536 for 16-bit indices 


15.3.2 Fractional shift works with 8-bit vertex attributes 


With HW1, 8-bit vertex attributes could only be whole numbers with a zero fractional shift. This has been 
fixed for HW2; therefore, the fraction argument for GXSet Vt xAttrFmt now applies for 8-bit attributes as 
well as for 16-bit attributes. 
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15.3.3. Renormalization and “post-transform” matrices added for texgens 
The following figure details the additional functionality in the texgen computation path for HW2: 


Figure 73 - Texgen computation path 


Source Select 
GX_TG_TEX 
GX_TG_POS 
GX_TG_NRM 
GX_TG_BINRM 
GX_TG_TANGENT 























Xx Optional G ‘) Computed 
) Renormalize TexCoord 
Mat-Vec Mat-Vec 
Multiply Multiply 
Pos/Tex Mtx Mem Post-Transform Mtx Mem 


Whereas the HW1 texcoord computation path stops after the first matrix-vector multiply, HW2 adds an 
optional renormalization step followed by a second “post-transform’” matrix-vector multiply. 


There are various ways to make use of these extra features. You can choose to provide twice as many 
matrices for position transformation and use only the extra memory for texgens (you would select the iden- 
tity matrix for the first transformation). You can also use this feature to provide more efficient projected tex- 
tures. In this case, you use the position multiplied by the position matrix, then multiplied by a reprojection/ 
rescale matrix. This feature may also be used to make environment mapping easier. You use the normal 
multiplied by a regular normal matrix (but stored in the Pos/Tex Mtx memory), then renormalized and multi- 
plied by a post-transform matrix that rescales the normal into texture space. 


This feature is accessed by using GXSetTexCoordGen2. 


Code 120 - GXSetTexCoordGen2 


GxSetTexCoordGen2 ( 
GXTexCoordID dst_coord, 
GxXTexGenType func, 


GxXTexGenSrc src_param, 
u32 mtx, 
GXBool renormalize, 
u32 pt_mtx ); 
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15.3.4 Line (but not point) aspect ratio fixed for field-mode rendering 


In field-mode and double-strike rendering, points and lines should be rendered with a different aspect ratio 
than for frame rendering. This was fixed for lines in HW2, but not for points. You can set this feature 
through the use of GXSet FieldMode. When the half_aspect_ratio parameter is set to GX_ TRUE, lines are 
drawn at half the regular height. 





15.3.5 New TEV compare functions added 


The compare functionality of GxSet TevC LampMode has been removed. It has been replaced with new 
TEV compare functionality that is specified on HW2 by additional GXTevOp's. When one of these functions 
is specified, the TEV output equation is modified as follows: 


Equation 41 - Regular TEV output 

output = (d + ((1 -c) *a+c* b) + bias) * scale 
Equation 42 - Compare TEV output 

output = d + ((a OP b) ? c:0) 


You have a choice of 8-, 16-, or 24-bit wide compares, or an 8-bit per-component compare. When a com- 
pare operation is performed, the output scale factor can only be set to one, and the bias can only be set to 
zero. 


The following tables describe the different operations. For the 16- and 24-bit compares, the MSB is listed 
first. 


Table 16 - Color or alpha compare operations 


(Ofo} (0) mo) ay NI el at: MOxeyin] ey-1K:) Operation 











GX_TEV_COMP_R8_GT a[red] > b[red] ? 
GX_TEV_COMP_R8_EQ a[red] == b[red] ? 
GX_TEV_COMP_GR16_GT a[green, red] > b[green, red] ? 
GX_TEV_COMP_GR16_EQ a[green, red] == b[green, red] ? 
GX_TEV_COMP_BGR24_GT a[blue, green, red] > b[blue, green, red] ? 
GX_TEV_COMP_BGR24 EQ a[blue, green, red] == b[blue, green, red] ? 














Table 17 - Color-only compare operations 


Color-only Compare Operation 


GX_TEV_COMP_RGB8_GT per-component: a[component] > b[component] ? 





GX_TEV_COMP_RGB8_ EQ per-component: a[component] == b[component] ? 














Table 18 - Alpha-only compare operations 








PNT elar-treyal ym exe) ti) ex- 1-1 Operation 
GX_TEV_COMP_A8_GT a[alpha] > bf[alpha] ? 
GX_TEV_COMP_A8_ EQ a[alpha] == b[alpha] ? 
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15.3.6 More flexibility in TEV for texture and raster color component swaps 


In addition to being able to swap the texture color input to the TEV, you can now swap the rasterized color 
input. Plus, you can select which color channel will be used for each component on a component-by-com- 
ponent basis. This selection is done through a four-entry table. For each TEV stage, you may select an 
entry from the table for the raster color input and an entry for the texture color input. This selection is 
accomplished using the function GxSet TevSwapMode. The swap table is set up using the function 
GXSetTevSwapModeTable. 


Code 121 - GXSetTevSwapMode, GXSetTevSwapModeTable 


GXSetTevSwapMode ( 
GxXTevStageID stage, 
GxXTevSwapSel ras sel, 
GXTevSwapSel tex _sel ); 


GXSetTevSwapModeTable ( 
GxXTevSwapSel select, 
GXTevColorChan red, 
GXTevColorChan green, 
GXTevColorChan blue, 
GxXTevColorChan alpha ); 





15.3.7 New TEV “constant” color registers, component selectable 


Four new registers are available as TEV inputs. They can be treated as RGBA registers, or as a set of four 
scalar registers. These registers cannot be used as TEV outputs, and thus they are referred to as “con- 
stant” (or “konstant”) registers, although they can easily be modified by a GX set command. New actual 
constant values are provided in addition to the new register choices. 


The new color inputs are selected through a two-level selection system. First, using GXSetTevColoriIn 
(or Gk<SetTevAlphatIn), you select the GX_CC_KONST (or GX_CA_KONST). Then, using 
GXSetTevKColorSel (or GXSet TevKAlphaSe1) you select the constant selection desired for each TEV 
stage: 


Table 19 - Color and alpha constant register values 


























Color KONST values Alpha KONST values Description 
GX_TEV_KCSEL_1 GX_TEV_KASEL_1 1.0 scalar 
GX_TEV_KCSEL_7_8 GX_TEV_KASEL_7_8 7/8 scalar 
GX_TEV_KCSEL_3 4 GX_TEV_KASEL_3 4 3/4 scalar 
GX_TEV_KCSEL_5 8 GX_TEV_KASEL_5 8 5/8 scalar 
GX_TEV_KCSEL_1_ 2 GX_TEV_KASEL_1_2 1/2 scalar 
GX_TEV_KCSEL 3 8 GX_TEV_KASEL_3_8 3/8 scalar 
GX_TEV_KCSEL_1_4 GX_TEV_KASEL_1_4 1/4 scalar 
GX_TEV_KCSEL_1_ 8 GX_TEV_KASEL_1_8 1/8 scalar 
GX_TEV_KCSEL_KO -- KO RGB 
GX_TEV_KCSEL_K1 - K1 RGB 
GX_TEV_KCSEL_K2 -- K2 RGB 
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Table 19 - Color and alpha constant register values (Continued) 


Graphics Library (GX) 







































































Color KONST values Alpha KONST values Description 
GX_TEV_KCSEL_K3 -- K3 RGB 
GX_TEV_KCSEL_KO_R GX_TEV_KASEL_KO_R KO R scalar 
GX_TEV_KCSEL_K1_R GX_TEV_KASEL_K1_R K1 R scalar 
GX_TEV_KCSEL_K2_R GX_TEV_KASEL_K2_R K2 R scalar 
GX_TEV_KCSEL_K3_R GX_TEV_KASEL_K3_R K3 R scalar 
GX_TEV_KCSEL_KO_G GX_TEV_KASEL_K0_G KO G scalar 
GX_TEV_KCSEL_K1_G GX_TEV_KASEL_K1_G K1 G scalar 
GX_TEV_KCSEL_K2_G GX_TEV_KASEL_K2_G K2 G scalar 
GX_TEV_KCSEL_K3_G GX_TEV_KASEL_K3_G K3 G scalar 
GX_TEV_KCSEL_KO_B GX_TEV_KASEL_K0O_B KO B scalar 
GX_TEV_KCSEL_K1_B GX_TEV_KASEL_K1_B K1 B scalar 
GX_TEV_KCSEL_K2_B GX_TEV_KASEL_K2_B K2 B scalar 
GX_TEV_KCSEL_K3 B GX_TEV_KASEL_K3_B K3 B scalar 
GX_TEV_KCSEL_KO_A GX_TEV_KASEL_K0O_A KO A scalar 
GX_TEV_KCSEL_K1_A GX_TEV_KASEL_K1_A K1 A scalar 
GX_TEV_KCSEL_K2_A GX_TEV_KASEL_K2_A K2 A scalar 
GX_TEV_KCSEL_K3_A GX_TEV_KASEL_K3_A K3 A scalar 














The command GXSetTevKColor may be used to modify the values of the four new constant color regis- 
ters. GXSet TevKColor works the same way as GXSetTevColor. Unlike the existing TEV registers, the 
new registers have only 8 bits per component. 


15.3.8 


HW2 provides a new “blend” operation which allows the computation: 


Subtractive “blend” mode 


Equation 43 - Subtractive blend operation 


final pixel color = destination color - source color 


You select this blend operation using GkSetBlendMode and choosing a blend_mode of 


GX_ 
Note: 


BM SUBTRACT. 


Unlike additive blending, you cannot specify source and destination factors (coefficients) in this 


equation. This blend operation is also available with writes from the CPU to the EFB, and select- 
able using GXPokeBlendMode. 
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15.3.9 New texture copy types (for both color and Z) 


HW2 provides new copy-texture output formats for the GxCopyTex operation. These are available through 
the use of new GXTexFmt enumerated values, which are only valid for use with GxSet TexCopyDst. 


Table 20 - New GXTexFmt enumerated values 


Value | DY-X-Yeq ale) dfoy a} 
































GX_CTF_R4 4b red channel 
GX_CTF_RA4 4b red + 4b alpha 
GX_CTF_RA8 8b red + 8b alpha 
GX_CTF_A8 8b alpha 
GX_CTF_R8 8b red 
GX_CTF_G8 8b green 
GX_CTF_B8 8b blue 
GX_CTF_RG8& 8b red + 8b green 
GX_CTF_GB8 8b green + 8b blue 
GX_CTF_Z4 4b Z 
GX_CTF_Z8M 8b Z (middle byte) 
GX_CTF_Z8L 8b Z (lower byte) 
GX_CTF_Z16L 16b Z (lower part) 





15.3.10 Scissor box offset 


HW2 provides a means of shifting the area of the scissor box within the space of the EFB memory. Nor- 
mally, the upper-left corner of the scissor box maps to the same corner in EFB space. You can now specify 
an offset that is subtracted from the computed pixel’s location before it is stored in the EFB. Thus, you can 
shift the area of the scissor box up and/or left within the EFB space. 


The main purpose of this feature is to simplify dual-pass rendering for antialiasing. You can now maintain 
the same viewport for rendering the upper and lower halves of the screen. The upper half is drawn with the 
scissor box set to the upper half of the viewport and the offset set to zero. The lower half is drawn by mov- 
ing the scissor box to the bottom half of the viewport and adjusting the offset to place the scissor box area 
within the EFB’s valid area (since the EFB is only half the screen height in antialiased frame-rendering 
mode). 


The offset is adjusted using the function GxSetScissorBoxOffset: 
Code 122 - GXSetScissorBoxOffset 


GXSetScissorBoxOffset( u32 xoff, u32 yoff ); 
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16 Limitations 


This chapter outlines features that disable other features, or only work in restricted cases. 


16.1 Antialiasing 
* Antialiasing can use pixel format GX_PF_RGB565_ 216 only. 


¢ At peak, antialiased rendering runs at half the maximum fill rate of non-antialiased rendering, or 400 
megapixels/second. (Beyond this reduction in the peak rate, the formula for computing the antialiased 
rendering speed is the same as for non-antialiased rendering; see "9 Texture environment (TEV)" on 
page 89 for more details.) 


« Z textures cannot be copied from an antialiased frame buffer. 


¢ Dithering doesn’t work with antialiasing. 


16.2 CPU access to the frame buffer 

The application must synchronize CPU access to the Embedded Frame Buffer (EFB) with normal render- 
ing to the EFB. 

16.3 Display lists 


When creating a display list at runtime by calling GX functions bracketed by GkxBeginDisplayList/ 
GXEndDisplayList, the following functions may not be used: 





* GXBeginDisplayList. 





* GXEndDisplayList. 

* GXCallDisplayList. 

In addition, the following types of functions cannot be placed inside of a display list: 

* GXInit* 

* GXRead* 

* GXPeekARGB/GXPeekZ 

* GXPokeARGB/GXPokeZ 

* GXxXGet* 

These may be executed while a display list is being created; however, they will not be put into the display 
list. 

16.4 Vertex performance 

Vertex performance depends on the lighting and texture coordinate features selected. See the “Vertex Per- 
formance Calculator” page in the Dolphin Reference Manual (HTML). 

16.5 Matrix memory 

Position and texture matrices share the same internal matrix memory. Normal matrices are stored in a sep- 
arate memory. 

16.6 Texture 


Color index textures cannot be trilinearly-filtered. 
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Mipmaps using texel format GX_TF_RGBA8 require two cycles to filter. These cycles are internal to the tex- 
ture filter hardware and do not effect the total number of TEV stages available. 
16.7 Blending and logic operations 


You must choose between blending and logical operations; you cannot do both at the same time. 


16.8 Sharing main memory resources 


While the embedded frame buffer (EFB) eases the bandwidth requirements of the main memory, there are 
still several large main memory bandwidth hogs in the system, including the CPU, vertex input to the GP, 
texture fetches, and copies from the EFB. 
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17 Comparison to the Nintendo 64 


17.1 Display lists vs. immediate mode 


The Nintendo 64 (N64) used a double-buffered display list method to pass graphics data from the CPU to 
the graphics processor (Reality Engine). The Nintendo GameCube system provides an immediate-mode 

interface on initialization, but may be configured to use the double-buffered display list method as well. In 
general, immediate mode is an easier way to bring up new applications, but display lists can provide more 
performance in certain cases. 


17.2 No microcoded processor 


N64 had a microcoded geometry processor called the RSP. This provided some flexibility to optimize 
microcode for a particular game. However, the relatively small instruction memory for the RSP reduced the 
benefit somewhat. 


Nintendo GameCube’s transform and lighting processor is hardwired, but very fast and very pipelined. 
Although it is not a programmable processor, Nintendo GameCube’s transform and lighting hardware is 
highly configurable and programmable, and it provides special features that would require many vector 
CPU cycles to emulate. 


N64 could direct the result of RSP processing (setup polygons) to a DRAM FIFO, which was read by the 
display processor (RDP). By contrast, Nintendo GameCube has an internal FIFO (about 100 full-sized 
primitives deep) between the geometry processor and the display processor and cannot save the interme- 
diate results to main memory. 


17.3. Vertex buffer 


The N64 used a vertex buffer in the DMEM of the RSP to assemble primitives. The vertices were trans- 
formed during the load, and then could be used to draw several triangles. The vertex buffer was small, and 
required frequent loading. Model data had to be converted into chunks that fit into the vertex cache, which 
sometimes meant reloading some vertices. Moreover, the vertex format was relatively fixed; it required a 
microcode change to change the vertex format. 


Nintendo GameCube, on the other hand, either loads vertex data directly into a streaming buffer, or 
indexes vertex data through a vertex cache. The vertex data is transformed each time it is used in a primi- 
tive. Nintendo GameCube also supports triangle strip and triangle fan primitives, which helps reduce the 
vertex/primitive ratio. Thanks to its vertex cache, Nintendo GameCube only loads new vertex data when it 
is not in the cache, reducing DRAM bandwidth. Because each component of a vertex can be indexed sep- 
arately, only the unique data for each component needs to be stored in main memory. 

Nintendo GameCube supports a flexible vertex format, both in terms of grouping attributes together, and in 
quantizing attributes to reduce memory and bandwidth requirements. 


17.4 Textures 


The N64 had a small Texture Memory (TMEM), and only textures in that memory could be used during ren- 
dering. Nintendo GameCube, on the other hand, has a large 1MB TMEM that can be used as multiple tex- 
ture caches; it can also be preloaded with textures like the N64. The TMEM is entirely separate from the 
embedded frame buffer (EFB). 


Texture caches allow you to use large textures for rendering without loading the entire texture in TMEM. In 
fact, only the texels that are visible and not already in the cache actually get loaded. Texture caches also 
allow the entire main memory to be used as texture memory. Texture caches are more efficient when used 
with mipmaps. Nintendo GameCube supports single-cycle mipmap filtering, whereas N64 required two 
cycles for mipmap filtering. Nintendo GameCube also supports anisotropic texture filtering and various 
Level of Detail controls. 


© 2006 Nintendo RVL-06-0037-001-A 
CONFIDENTIAL Released: February 27, 2006 


166 Graphics Library (GX) 


Nintendo GameCube supports compressed textures. Compressed textures use only 4 bits/texel and have 
color quality comparable to 16- to 24-bit uncompressed textures. Compressed textures only support a 1-bit 
alpha. Nintendo GameCube also supports color index textures. Color lookup tables can range from 16 to 

16,000 entries, by powers of 2. Color index textures can be 4, 8, or 16 (14 bits usable) bits/texel. 


Nintendo GameCube supports Z buffering before texture application, which means that no texture will be 
accessed unless the pixel is visible. This helps reduce texture bandwidth in scenes that have high depth 
complexity. 


17.5 Winding order 


Nintendo GameCube defines frontfacing polygons as having a clockwise vertex order. N64 had the oppo- 
site order; i.e., frontfacing polygons had a counter-clockwise vertex order. You can use the following code 
to make Nintendo GameCube’s winding order the same as N64’s: 


Code 123 - Aligning Nintendo GameCube winding order with N64 


#define FRONT GX_CULL BACK 
#define BACK GX_CULL_FRONT 
// use FRONT/BACK in your code 


17.6 Video scaling 
N64 scaled the video image as an operation on the display buffer (processing done by the VI). 


Nintendo GameCube performs vertical scaling during the copy from EFB to the external frame buffer 
(XFB). Nintendo GameCube performs horizontal scaling in the video (VI) hardware as the XFB is being 
read for display. 


17.7 Antialiasing 


The N64 used a coverage-based method for edge antialiasing. This method had the advantage of using 
only a few additional bits of frame buffer memory, but had many artifacts and, as a result, complicated Z 
buffering and many of the blending functions. 


Nintendo GameCube uses a sub-sampling method of antialiasing. This method uses twice the memory of 
a non-antialiased frame buffer, but it is orthogonal with most of the Graphics Processor features and 
results in better image quality. 


17.8 Coplanar polygons 


On the N64, coplanar polygons (also called geometric decals) used a fuzzy Z-compare method. This 
method was prone to breaking when the decal was in the distance. 


Nintendo GameCube implements coplanar polygons by defining a “reference plane.” The Z coefficients 
computed at setup are locked so that subsequent triangles use the reference plane’s coefficients, guaran- 
teeing that the same Z is produced at the same pixel. Reference planes may be invisible, if necessary. 


17.9 FREE Z buffering, FREE blending 


N64 kept its Z buffer and color buffer in main memory. As a result, applications that used Z buffering also 
used more main memory bandwidth. Applications that did read/modify/write operations on the color buffer 
used more main memory bandwidth than applications that just read the color buffer. 
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Nintendo GameCube’s embedded frame buffer (EFB) is a high-speed embedded 1T SRAM memory dedi- 
cated to the color and Z buffers. This memory has been designed so that the pipeline can render at full 
speed with both Z buffering and color read/modify/write operations enabled. Since this memory is embed- 
ded in the Graphics Processor, no main memory bandwidth is used. Also, the Texture Memory (TMEM) is 
physically separate from the EFB, so texture accesses can occur simultaneously with frame buffer 
accesses. 

In general, it is still a good idea when Z buffering to roughly sort 3D objects from nearest to farthest away. 
This sort will reduce the number of texture fetches necessary for opaque objects; therefore, enable Z buff- 
ering before texturing, especially for scenes with high depth complexity. 
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18 Comparison to OpenGL 


18.1 Vertex description 


OpenGL allows a free-form vertex description, and has the notion of current vertex state. If some vertex 
component is not specified, then the current vertex state for that component is used. The format of the ver- 
tex is specified between g1Begin/glEnd. This implies that each component of data has extra information 
associated with it that specifies its type. 





The Nintendo GameCube, on the other hand, specifies the format for every vertex between GxBegin/ 
GXEnd ahead of time and sends this information to the Graphics Processor. Every vertex between 
GXBegin/GXEnd has the same format. Therefore, there is only vertex data produced between GxBegin/ 
GXEnd. The order of data in a vertex is also specified by the API. These limitations are minor, but allow 
simpler and faster hardware implementation and lower bandwidth (no type for each data component). 








OpenGL 1.1 has the concept of vertex arrays, in which arrays of vertex structures can be indexed to create 
a primitive. Nintendo GameCube supports arrays of vertex components, requiring an index for each com- 
ponent. This allows greater compression of model data. Nintendo GameCube can also support arrays of 
vertex structures. In addition, Nintendo GameCube contains a vertex cache that caches each component 
individually. In Nintendo GameCube, the Graphics Processor de-references the index, rather than the 
CPU. The vertex cache enables a natural and flexible data representation, and lowers bandwidth require- 
ments. 


Nintendo GameCube can also mix together indexes and data in the command stream for maximum flexi- 
bility. 


18.2 Matrices 


OpenGL includes several matrix stacks that the application can manipulate. Nintendo GameCube, on the 
other hand, has a separate library, MTX, which can be used to create and manipulate matrix stacks. The 
MTX library uses the CPU to do calculations and stores matrices in main memory. The GX API expresses 
more directly the capability of the Graphics Processor. The Graphics Processor has an internal matrix 
memory that can be indexed per-object or per-vertex. The GX API provides a way to load this matrix mem- 
ory and set matrix indexes. 


18.3 Lighting 


OpenGL supports a per-vertex lighting model with emissive, ambient, specular, diffuse, and spotlight 
effects. Many lights can contribute to the final vertex color. OpenGL supports two-sided lighting. 


Nintendo GameCube supports lights with ambient, diffuse, specular, and spotlight effects for per-vertex 
lights. Nintendo GameCube supports local lights with distance attenuation in hardware. 

Nintendo GameCube can use the per-vertex attenuation to attenuate light textures. The GX API supports 
an arbitrary number of lights, but the Graphics Processor supports only four lights at a time. Therefore, 
Nintendo GameCube introduces the concept of light channels (up to four) in which a vertex color compo- 
nent can be associated with one or more lights. These channels can be used as the vertex color, or they 
can be passed down the pipeline to use in per-pixel calculations. In general, Nintendo GameCube com- 
bines the capabilities of per-vertex lighting and texture lighting to make a more powerful lighting model. 
Nintendo GameCube also allows CPU-computed lights and pre-lighting to be merged with lighting com- 
puted by the Graphics Processor. 
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18.4 Texture coordinate generation 


OpenGL supports planar and spherical projections to generate texture coordinates. The 

Nintendo GameCube Graphics Processor (GP) supports transforming positions, normals, or texture coor- 
dinates by an arbitrary matrix to generate texture coordinates. Nintendo GameCube also has a special 
mode to generate texture coordinates from the results of per-vertex lighting. Bump mapping is another 
special type of texture coordinate generation that is supported by Nintendo GameCube. These are all 
methods supported by hardware—the CPU can also be used to generate texture coordinates based on 
other algorithms. 


18.5 Texture and multi-texture 


OpenGL has recent support for multi-texture. Nintendo GameCube’s default texture configuration is very 
similar to the OpenGL method. However, Nintendo GameCube has a much more configurable texture 
pipeline than that specified by OpenGL. Input texture coordinates (up to eight), generated texture coordi- 
nates (up to eight), textures (up to eight), and texture environment stages (up to 16) can all be arbitrarily 
associated with each other. This allows for more flexibility in setting up a lighting equation, for example, or 
to separate global and local game state. 


Nintendo GameCube also supports sophisticated indirect-texture effects, Z textures, and texture creation 
using the Graphics Processor. 
18.6 Polygon offset 


OpenGL supports the offsetting of polygon depth to prevent Z-fighting among coplanar polygons. 
Nintendo GameCube uses a “reference plane” method to guarantee that coplanar faces generate exactly 
the same depth for each pixel. The reference plane may be invisible. 
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Appendix A. GX API functions 
This appendix lists the GX API. 


A.1 


Code 124 - Cpu2Efb 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


Cpu2Efb 


GX Pee kARGB ( 


GXPeekzZ ( 


GXPokeAlphaMode ( 


GXPokeAlphaRead ( 


ul6é 
ul6é 
u32* 


ul6é 
ul6é 
£32% 


GxXCompare 
us 


x, 
Yr 
color ) 


x, 


Yr 


func, 
threshold ) 


GXAlphaReadMode mode ) 


GXPokeAlphaUpdate( GXBool update enable ) 


GXPokeARGB ( 


GXPokeBlendMode ( 


ul6é 
ul6é 
u32 


GXBlendMode 
GXBlendFactor 
GXBlendFactor 


GXLogicOp 


Xx, 
Yr 
color ) 


type, 
src_factor, 
dst_factor, 


op ) 


GXPokeColorUpdate( GXBool update enable ) 


GxPokeDither( GXBool dither ) 


GXPokeDstAlpha ( 


GXPokeZ ( 


GXPokeZMode ( 
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GXBool 
us 


ul6é 
ul6é 
£32 


GXBool 
GxXCompare 


enable, 
alpha ) 


x, 


Yr 


compare enable, 
func ) 
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A.2 Culling 
Code 125 - Culling 


void GXGetCullMode( GXCullMode* mode ) 
void GXGetScissor ( 
ms2e -eO gy 
u32* yOrig, 
u32* wd, 
u3s2* hts 3) 
void GXSetCoPlanar( GXBool enable ) 


void GXSetCullMode( GXCullMode mode ) 


void GXSetScissor ( 


u32 xOrig, 
u32  yOrig, 
u32_ wd, 
u32 ht ) 


A.3  DisplayList 
Code 126 - DisplayList 


void GxXBeginDisplayList ( 
void* list, 
u32 size ) 

void GxXCallDisplayList ( 
void* list, 


u32 nbytes ) 


u32 GXEndDisplayList( void ) 


A.4 Draw 
Code 127 - Draw 


void GXDrawCube( void ) 

void GXDrawCylinder( u8 numEdges ) 
void GXDrawDodeca( void ) 

void GXDrawIcosahedron( void ) 


void GXDrawOctahedron( void ) 





void GXDrawSphere ( 
u8 numMajor, 
u8 numMinor ) 


void GXDrawSpherel( u8 depth ) 


void GXDrawTorus ( 


£32 Boy 
u8 numc, 
u8 numt ) 
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A.5 Framebuffer 
Code 128 - Framebuffer 


void GXAdjustForOverscan ( 
GXRenderModeObj* 
GXRenderModeObj* 
ul6 
ul6 


void GXClearBoundingBox( void ) 


void GXCopyDisp ( 
void* 
GXBool 


void GxXCopyTex ( 
void* 
GXBool 


void GXReadBoundingBox ( 
ul6* 
ul6* 
ul6* 
ul6* 


void GxXSetCopyClamp( GXFBClamp clamp 


void GXSetCopyClear ( 
GxXColor 
u32 


void GXSetCopyFilter ( 
GXBool 
us 
GXBool 
us 


void GxXSetDispCopyDst ( 
ul6é 
ul6é 


rmin, 
rmout, 
hor, 

ver ) 


dest, 
clear ) 


dest, 
clear ) 


left, 
top, 
right; 
bottom ) 


) 


clear_clr, 
clear _z ) 


aa, 

sample pattern[12] [2], 
vf, 

vfilter[7] ) 


wd, 
ht ) 


void GXSetDispCopyFrame2Field( GXCopyMode mode ) 


void GxXSetDispCopyGamma( GXGamma gamma ) 


void GxXSetDispCopySrc ( 
ul6é 
ul6é 
ul6é 
ul6é 


u32 GXSetDispCopyYScale( £32 yscale 


void GxXSetTexCopyDst ( 
ul6 
ul6 
GXTexFmt 
GXBool 


void GxXSetTexCopySrc ( 
ul6é 
ul6é 
ul6é 
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left, 
top, 
wd, 

ht ) 


wd, 
ht, 
fmt, 
mipmap ) 


left, 
top, 
wd, 
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ule he.) 
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A.6 Geometry 
Code 129 - Geometry 


void GXBegin ( 
GXPrimitive type, 
GXVtxFmt vtxfmt, 
ul6 nverts ) 


void GxXClearVtxDesc( void ) 


u8 r, u8 b, uB b, ub a ); 
wer, Weg. ue boyy 

u32° cle *) 7 

ule cin -)s 

ul6 index ); 

u8 index ) 


void GxColor4u8 
void GXColor3u8 
void GXColorlu32 
void GXColorlul6 
void GXColor1x16 
void GxXColor1x8 


( 
( 
( 
( 
( 
( 





void GXEnableTexOffsets ( 
GXTexCoordID coord, 
GXBool line enable, 
GXBool point _enable ) 


void GXEnd( void ) 


void GxXGetArray ( 
GXAttr attr, 
void* base ptr, 
u8 stride ) 


void GxXGetLineWidth ( 
u8* width, 
GxXTexOffset* tex_offsets ) 


void GXGetPointSize ( 
u8* size, 
GxXTexOffset* tex_offsets ) 


void GXGetVtxAttrFmt ( 
GXVtxFmt vtxfmt, 
GXAttr attr, 
GXCompCnt* cnt, 
GXCompType* type, 
u8* frac ) 


void GxXGetVtxAttrFmtv ( 
GXVtxFmt vtxfmt, 
GXVtxAttrFmtList* list ) 


void GXGetVtxDesc ( 
GxXAttr attr, 
GxXAttrType* type 


void GXGetVtxDescv( GXVtxDescList* attr list ) 
void GXInvalidateVtxCache( void ) 


void GxXMatrixIndexlu8 ( u8 index ); 
void GXMatrixIndex1x8 ( u8 index ) 


void GXNormal3f32 
void GXNormal3s16 
void GXNormal3s8 

void GXNormal1x16 


£32 2x50 <£32° Vo £32 22. NG 
SiG) x; SLO v7? S16. Zz *) 7 
s8 x, s8 y, s8 z ); 
ul6 index ); 





( 
( 
( 
( 
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void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 
void 
void 
void 
void 
void 
void 
void 
void 
void 
void 





void 


GXNormallx8 ( u8 index ) 


GxSetArray ( 
GxXAttr 
void* 
us 
GXSetLineWidth ( 
us 
GXTexOffset 
GXSetNumTexGens( u8 nTexGens 


GxSetPointSize ( 
us 
GxTexOffset 
GxXSetTexCoordGen ( 
GXTexCoordID 
GxTexGenType 
GXTexGenSrc 
u32 
GXSetVtxAttrFmt ( 
GXVtxFmt 
GxXAttr 
GXCompCnt 
GXCompT ype 
us 
GXSetVtxAttrFmtv ( 
GXVtxFmt 
GXVtxAttrFmtList* 
GXSetVtxDesc ( 
GxXAttr 
GxAttrType 
GXSetVtxDescv( GXVtxDescList* 
GxXTexCoord2f32( £32 s, £32 t 
GXTexCoord2ul6( ul6 s, ul6 t 
GXTexCoord2s16( sl6 s, sl6 t 
GXTexCoord2u8 ( u8 s, u8 t ); 
GXTexCoord2s8 ( s8 s, s8 t ); 
GxTexCoord1f32( £32 s, £32 t 
GXTexCoordlul6( ul6 s, ul6 t 
GXTexCoord1s16( sl6 s, slét 
GXTexCoordlu8 ( u8 s, u8 t ); 
GXTexCoord1s8 ( s8 s, s8 t ); 
GXTexCoord1x16( ul6 index ); 
GXTexCoord1x8 ( u8 index ) 
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attr, 
base ptr, 
stride ) 


width, 
tex offsets ) 


) 


size, 
tex_offsets ) 


dst_coord, 
func, 
src_param, 
mtx ) 


vtxfmt, 
attr, 
cnt, 


type, 
frac ) 


vtxfmt, 
list ) 


attr, 
type ) 


attr_list ) 
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A.7 GfxFIFO 
Code 130 - GfxFIFO 


void GXDisableBreakPt( void ) 

void GxXEnableBreakPt( void* break pt ) 
GXFifoObj* GXGetCPUFifo( void ) 
OSThread* GXGetCurrentGXThread ( void ) 
void* GXGetFifoBase( GXFifoObj* fifo ) 


void GxGetFifoLimits ( 
GXFifoObj* fifo, 
u32* hia, 
u32* lo ) 


void GxXGetFifoPtrs ( 
GXFifoObj* fifo, 
void** read ptr, 
void** write ptr ) 


u32 GXGetFifoSize( GXFifoObj* fifo ) 


void GxXGetFifoStatus ( 
GXFifoObj* fifo, 

GXBool* overhi, 

GXBool* underlow, 
u32* £ifo. cnt; 

GXBool* cpu_write, 

GXBool* gp read, 

GXBool* fifowrap ) 


GXFifoObj* GXGetGPFifo( void ) 


void GXGetGPStatus ( 
GXBool* overhi, 
GXBool* underlow, 
GXBool* readIdle, 
GXBool* cmdIdle, 
GXBool* brkpt ) 


u32 GXGetOverflowCount( void ) 


void GXInitFifoBase ( 
GXFifoObj* fifo, 
void* base, 
u32 size ) 


void GxXInitFifoLimits ( 
GXFifoObj* fifo, 
u32  hi_water_mark, 
u32 lo water mark ) 


void GXInitFifoPtrs ( 
GXFifoObj* fifo, 
void* read ptr, 
void* write ptr ) 


u32 GXResetOverflowCount( void ) 


void GXSaveCPUFifo( GXFifoObj* fifo ) 
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void GXSetCPUFifo( GXFifoObj* fifo ) 
void GXSaveGPFifo( GXFifoObj* fifo ) 
void GXSetGPFifo( GXFifoObj* fifo ) 


OSThread* GXSetCurrentGxXThread ( void ) 
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Indirect 


A.8 Indirect 
Code 131 - Indirect 


void GXSetIndTexCoordScale ( 
GXIndTexStageID 
GXIndTexScale 
GXIndTexScale 


void GxXSetIndTexMtx ( 
GXIndTexMtxID 
£32 
s8 


void GxXSetIndTexOrder ( 
GXIndTexStageID 
GxXTexCoordID 
GxXTexMapID 


void GxXSetNumIndStages( u8 nstages 


179 


ind_stage, 
scale s, 
scale t ) 


mtx_sel, 
offset_mtx[2] [3], 
scale exp ) 


ind_stage, 
tex_coord, 
tex_map ) 


void GxXSetTevDirect( GXTevStageID tev_stage ) 


void GXSetTevIndBumpST ( 
GXTevStageID 
GXIndTexStageID 
GXIndTexMtxID 


void GxSetTevIndBumpXyYZ ( 
GXTevStageID 

GXIndTexStageID 

GXIndTexMtxID 


void GxXSetTevIndirect ( 
GXTevStageID 
GXIndTexStageID 
GXIndTexFormat 
GXIndTexBiasSel 
GXIndTexMtxID 
GXIndTexWrap 
GXIndTexWrap 
GXBool 

GXBool 
GXIndTexAlphaSel 





tev_stage, 
ind_stage, 
matrix_sel ) 


tev_stage, 
ind_stage, 
matrix_sel ) 


tev_stage, 
ind_stage, 
format, 
bias sel, 
matrix_sel, 
wrap_s, 
wrap t, 
add_prev, 
utc_lod, 
alpha_sel ) 


void GXSetTevIndRepeat ( GXTevStageID tev_stage ) 


void GxXSetTevindTile ( 
GXTevStageID 
GXIndTexStageID 
ul6 
ul6 
ul6 
ul6 
GXIndTexFormat 
GXIndTexMtxID 
GXIndTexBiasSel 
GXIndTexAlphaSel 





void GxXSetTeviIndWarp ( 
GXTevStageID 
GXIndTexStageID 
GXBool 
GXBool 


© 2006 Nintendo 
CONFIDENTIAL 


tev_stage, 
ind_stage, 
tilesize s, 
tilesize t, 
tilespacing s, 
tilespacing t, 
format, 
matrix_sel, 
bias sel, 
alpha_sel ) 


tev_stage, 
ind_stage, 
signed offsets, 
replace mode, 
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GXIndTexMtxID matrix_sel ) 
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Lighting 


A.9 Lighting 
Code 132 - Lighting 


void GxGetLightAttnA ( 
GXLightObj* 
Eo2” 
£32* 
ES2* 


void GxXGetLightAttnkK ( 
GXLightObj* 
£32% 
ES2* 
£32* 


void GxXGetLightColor ( 
GXLightObj* 
GxXColor* 


void GxGetLightDir ( 

GxXLightObj* 
f32% 
£32% 
£32 * 


#define GxXGetLightDirv(lo, vec) \ 
(GXGetLightDir((lo), (f£32*) (vec), 


void GxGetLightPos ( 

GXLightObj* 
£32* 
£S2* 
£32" 


#define GXGetLightPosv(lo, vec) \ 
(GXGetLightPos((lo), (f£32*) (vec), 


void GxXInitLightAttn ( 

GXLightObj* 
£32 
£32 
£32 
£32 
£32 
£32 


void GxXInitLightAttnA ( 

GXLightObj* 
£32 
£32 
£32 


void GxXInitLightAttnk ( 

GXLightObj* 
£32 
£32 
£32 


void GxXInitLightColor ( 
GXLightObj* 
GxXColor 


void GXInitLightDir ( 
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1lt_obj, 


ad, 
al, 
a2 ) 


1lt_obj, 


k0, 
kl, 
k2 ) 


1lt_obj, 


color 


) 


1lt_obj, 


nx, 


ny, 
nz ) 


(£32*) (vec)+1, 


1lt_obj, 


x, 


Yr 
Zi) 


(£32*) (vec)+1, 


1lt_obj, 


ad, 
al, 
a2, 
ko, 
kl, 
k2 ) 


1lt_obj, 


ad, 
al, 
a2 ) 


1lt_obj, 


k0, 
kl, 
k2 ) 


1lt_obj, 


color 


) 


(£32*) (vec)+2)) 


(£32*) (vec)+2)) 
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GXLightObj* 1t_obj, 


£32 nx, 
£32 -nyy 
£32) nz) 


#define GXInitLightDirv(lo, vec) \ 
(GXInitLightDir((lo), *(£32*) (vec), *((£32*) (vec)+1), *((£32*) (vec) +2)) 
void GXInitLightDistAttn ( 
GXLightObj* I1t_obj, 
£32 ref distance, 
£32 ref brightness, 
GXDistAttnFn dist func ) 


void GXInitLightPos ( 
GXLightObj* 1t_obj, 


£32) Ux, 
£32 y, 
£32 2z) 


#define GXInitLightPosv(lo, vec) \ 
(GXInitLightPos((lo), *(£32*) (vec), *((£32*) (vec)+1), *((£32*) (vec) +2) ) 
#define GXInitLightShininess(lobj, shininess) \ 
GXInitLightAttn(lobj, 0.0F, 0.0F, 1.0F, \ 
(shininess)/2.0F, 0.0F, \ 
1.0F-(shininess)/2.0F ) 


void GXInitLightSpot ( 
GXLightObj* 1t_obj, 
£32 cutoff, 
GXSpotFn spot func ) 


void GXInitSpecularDir ( 
GXLightObj* 1t_obj, 


£32 nx, 
ES: avy 
£32 nz ) 


void GXInitSpecularDirv( lo, vec ) \ ( 
GXInitSpecularDir((lo), *(£32*) (vec), *((£32*) (vec)+1), *((£32*) (vec) +2)) 


void GXLoadLightObj Imm ( 
GXLightObj* I1t_obj, 
GXLightID light ) 


void GXLoadLight0Obj Indx ( 
u32 1t_obj_indx, 
GXLightID light ) 


void GxXSetChanAmbColor ( 
GXChannelID chan, 
GxColor amb_color ) 


void GXSetChanCtrl ( 
GXChannelID chan, 
GXBool enable, 
GxColorSrce amb_src, 
GxColorSre mat_src, 
GXLightID light_mask, 
GxXDiffuseFn diff fn, 
GXAttnFn attn_fn ) 


void GXSetChanMatColor ( 
GXChannelID chan, 
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Management 


GxColor mat_color ) 


void GXSetNumChans( u8 nChans ) 


A.10 Management 
Code 133 - Management 
void GXAbortFrame( void ) 
void GXDrawDone( void ) 
void GXFlush( void ) 
GXFifoObj* GXInit ( 
void* base, 
u32 size 
void GXPixModeSync( void ) 
ul6 GXReadDrawSync( void ) 


void GxXSetDrawDone( void ) 


typedef void (*GXDrawDoneCallback) (void); 
GXDrawDoneCallback GXSetDrawDoneCallback ( 


void GXSetDrawSync( ul6 token ) 


typedef void (*GXDrawSyncCallback) (ul6 token); 
GXDrawSyncCallback cb ) 


GxXDrawSyncCallback GXSetDrawSyncCallback ( 


void GxSetVerifyLevel( u8 level ) 


void GxXTexModeSync( void ) 


void GXWaitDrawDone( void ) 
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) 


GXDrawDoneCallback cb 
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A.11 Performance 

Code 134 - Performance 

u32 GXGetPerfMetric( GxXPerf perf 
u32 GXReadClksPerVtx( void ) 

u32 GXReadGPOMetric( GxXPerf0 perf0 
u32 GXReadGPlMetric( GXPerfl perfl 


void GXReadGPMetric ( 
GXPerf0 
u3s2* 
GXPerfl 
u32* 


u32 GXReadMemMetric( GXMem perf ) 


void GXReadPixMetric ( 
u32* 
us2* 
u32* 
u32* 
u32* 
u32* 


void GXReadVvCacheMetric ( 
GxXVCachePerf 
u32* 
u32* 
u32* 
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perf0, 
ento, 
perfil, 
cntl ) 


top_pixels_in, 
top pixels out, 
bot_pixels_in, 
bot_pixels out, 
clr pixels in, 
copy _clks ) 


attr, 
check, 
miss, 
stall ) 
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PixelProc 


A.12 PixelProc 
Code 135 - PixelProc 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


GXInitFogAdjTable ( 
GXFogAdjTable* 
ul6é 
£32 


table, 
width, 
projmtx[4][4] ) 


GxXSetAlphaUpdate( GXBool update enable ) 


GxXSetBlendMode ( 
GXBlendMode 
GXBlendFactor 
GXBlendFactor 
GXLogicOp 


type, 
src_factor, 
dst_factor, 
op ) 


GxXSetColorUpdate( GXBool update enable ) 


GxXSetDither( GXBool dither ) 


GXSetDstAlpha ( 
GXBool 
us 


GXSetFieldMask ( 
GXBool 
GXBool 


GXSetFieldMode ( 
GXBool 
GXBool 


GxXSetFog ( 

GXFogType 
£32 
£2 
£32 
£32 

GxXColor 


GXSetFogRangeAdj ( 
GXBool 
ul6 
GXFogAdjTable* 


GxSetPixelFmt ( 
GXPixelFmt 
GXZFmt16 


enable, 
alpha ) 


odd_mask, 
even _mask ) 


field_mode, 
half_aspect_ratio ) 


type, 
startz, 
endz, 
nearz, 
farz, 
color ) 


enable, 
center, 
table ) 


pix_fmt, 
z_fmt ) 


GXSetZCompLoc( GXBool before _tex ) 


GXSetZMode ( 
GXBool 
GxXCompare 
GXBool 
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compare enable, 
func, 
update enable ) 
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A.13 Tev 
Code 136 - Tev 


void GXSetAlphaCompare ( 
GXCompare 
us 
GXAlphaOp 
GXCompare 
u8 


void GxXSetNumTevStages( u8 nStages 


void GxXSetTevAlphalIn ( 
GxXTevStageID 
GXTevAlphaArg 
GXTevAlphaArg 
GXTevAlphaArg 
GXTevAlphaArg 


void GXSetTevAlphaOp ( 
GXTevStageID 
GxXTevOp 
GXTevBias 
GxTevScale 
GXBool 
GxXTevRegID 


void GxXSetTevClampMode ( 
GXTevStageID 
GxTevClampMode 





void GXSetTevColor ( 
GxXTevRegID 
GxXColor 


void GxXSetTevColortIn ( 
GXTevStageID 
GXTevColorArg 
GXTevColorArg 
GXTevColorArg 
GXTevColorArg 


void GxXSetTevColoroOp ( 
GXTevStageID 
GxXTevOp 
GXTevBias 
GxTevScale 
GXBool 
GxXTevRegID 


void GxSetTevColorS10 ( 


iw) 


GxTevReg 
GXColorsS 


oO 


void GXSetTevOp ( 





GXTevStageID 
GXTevMode 


void GXSetTevOrder ( 
GXTevStageID 
GXTexCoordID 
GxTexMapID 
GXChannelID 
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compO, 
refo, 
OP, 

compl, 
refl ) 


stage, 
a, 
b, 
Cc, 


stage, 
Op, 

bias, 
scale, 
clamp, 
out_reg ) 


stage, 
mode ) 


iady 
color ) 


stage, 
a, 
Dy 
Cc, 


stage, 
Op, 

bias, 
scale, 
clamp, 
out_reg ) 


iedy 
color ) 


id, 
mode ) 


stage, 
coord, 
map, 

color ) 
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void GxXSetZTexture ( 
GXZTexOp op, 
GXTexFmt fmt, 
use bias) 
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A.14 Texture 
Code 137 - Texture 


u32 GxXGetTexBufferSize ( 
ul6é 
ul6é 
u32 
GXBool 
us 


void GXGetTexObjA11 ( 
GXTex0Obj* 
vyoid** 
ul6* 
ul6* 
GxXTexFmt* 
GxXTexWrapMode* 
GxTexWrapMode* 
GXBool* 


void* GXGetTexObjData( GXTexObj* obj 


width, 
height, 
format, 
mipmap, 
max_lod ) 


obj, 

image ptr, 
width, 
height, 
format, 
wrap_s, 
wrap t, 
mipmap ) 


) 


GXTexFmt GXGetTexObjFmt ( GXTexObj* obj ) 


ul6 GXGetTexObjHeight ( GXTexObj* obj 


) 


GXBool GXGetTexObjMipMap( GXTexObj* obj ) 


void *GXGetTexObjUserData( GXTexObj* obj ) 


ul6 GXGetTexObjWidth( GXTexObj* obj 


) 


GXTexWrapMode GXGetTexObjWrapS( GXTexObj* obj ) 


GXTexWrapMode GXGetTexObjWrapT( GXTexObj* obj ) 


void GxXGetTlutObjA11 ( 
GXTlutObj* 
void** 
GXTlutFmt* 
ul6* 


tlut_obj, 
lut, 

fmt, 
n_entries ) 


void* GXGetTlutObjData( GXTlutObj* tlut_obj ) 


GXTlutFmt GXGetTlutObjFmt( GXTlutObj* tlut_obj ) 


ul6 GXGetTlutObjNumEntries( GXTlutObj* tlut_obj ) 


void GXInitTexCacheRegion ( 
GXTexRegion* 
GXBool 
u32 
GXTexCacheSize 
u32 
GXTexCacheSize 


void GXInitTexOb} ( 
GXTexObj* 
void* 
ul6é 
ul6é 
GXTexFmt 
GXTexWrapMode 
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region, 

is_32b mipmap, 
tmem_even, 
size_even, 
tmem_odd, 

size _odd ) 


obj, 

image ptr, 
width, 
height, 
format, 
wrap_s, 
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Texture 


GXTexWrapMode 
GXBool 


void GXInitTexObjCI ( 
GXTex0Obj* 
void* 
ul6 
ul6 
GXCITexFmt 
GXTexWrapMode 
GxXTexWrapMode 
GXBool 
u32 


void GXInitTex0ObjLOD ( 
GXTexObj* 
GXTexFilter 
GXTexFilter 
£32 
£32 
£32 
GXBool 
GXBool 
GXAnisotropy 


void GXInitTex0ObjUserData ( 
GXTexObj* 
void* 


void GXInitTexPreLoadRegion ( 
GXTexRegion* 
u32 
u32 
u32 
u32 


void GxXInitTlutOb3 ( 
GXTlutObj* 
void* 
GXTlutFmt 
ul6 


void GXInitTlutRegion ( 
GXTlutRegion* 
u32 
GXTlutSize 


void GXInvalidateTexAll( void ) 


void GXInvalidateTexRegion( GXTexRegion* region ) 


void GXLoadTex0Ob3 ( 
GXTexObj* 
GXTexMapID 


void GXLoadTex0bj PreLoaded ( 
GXTexObj* 
GXTexRegion* 
GxXTexMapID 


void GXLoadTlut ( 
GXTlutObj* 
use 


void GXPreLoadEntireTexture ( 
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wrap t, 
mipmap ) 


obj, 

image _ptr, 
width, 
height, 
format, 
wrap _s, 
wrap t, 
mipmap, 
tlut_name ) 


obj, 
min_filt, 
mag filt, 
min_lod, 
max_lod, 
lod_bias, 
bias_clamp, 
do_edge_lod, 
max_aniso ) 


obj, 
user data ) 


region, 

tmem_even, 
size_even, 
tmem_odd, 
size odd ) 


tlut_obj, 
lut, 

fmt, 
n_entries ) 


region, 
tmem_addr, 
tlut_size ) 


obj, 
id ) 


obj, 
region, 
id ) 


tlut_obj, 
tlut_name ) 
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GXTex0Obj* 
GXTexRegion* 


void GXSetTexCoordScaleManually ( 
GXTexCoordID 
GXBool 
ul6 
ul6 


GXTexRegionCallback GXSetTexRegionCallback( GXTexRegionCallback f ) 


GXTlutRegionCallback GXSetTlutRegionCallBack( GXTlutRegionCallback f 
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tex obj, 
region ) 


texcoord, 
enable, 
BS; 

ts ) 
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Transform 


A.15 Transform 


Code 138 - Transform 


void GXGetProjectionv ( 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void 


void GxXSetCurrentMtx ( 


void GxXSetProjection ( 


void 


GXGetViewport ( 


GXGetViewportv ( 


GXLoadNrmMt xImm ( 


GXLoadNrmMt xImm3x3 ( 


GXLoadNrmMtxIndx3x3 ( 


GXLoadPosMtxImm ( 


GXLoadPosMtxIndx ( 


GXLoadTexMtxImm ( 


GXLoadTexMtxIndx ( 


GXProject ( 


) 


£32* 
£32* 
£32% 
£32* 
£3.2* 
£3 2% 


£32 


u32 


£32 
u32 


ul6é 
u32 


£32 
u32 


ul6é 
u32 


£32 
u32 


GxXTexMtxType 


ul6é 
u32 


GxXTexMtxType 


£32 
£32 
£32 
£32 
£32* 
£O2% 
£32* 
£32% 
£3 2% 


£32 


GXProjectionType 


GXSetViewport ( 
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xOrig, 
yOrig, 
wd, 
ht, 
neardZ, 
farZ ) 


mtxPtr[3] [4], 
id ) 


mtxPtr[3] [3], 
id ) 


mtxIndx, 
id ) 


mtxPtr[3][4], 
id ) 


mtxIndx, 
id ) 


mtxPtr[] [4], 
id; 
type ) 


mtxIndx, 
id, 
type ) 


mx, 
my, 

mz, 

mtx[3] [4], 
Pp, 

Vp, 

SX, 

sy, 

sz ) 


mtx[4] [4], 
type ) 
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void GxXSetViewportJitter ( 


void GXSetViewportv( £32* vp ) 
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£32 
£32 
£32 
£32 
£32 
£32 


£32 
£32 
£32 
£32 
£32 
£32 
ws 


xOrig, 
yOrig, 
wd, 
ht, 
nearZ, 
farZ ) 


xOrig, 
yOrig, 
wd, 
ok, 
nearZ, 
farZ, 
field 


) 
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Appendix B. GXInit defaults 
This appendix lists state set by GXInit. 
Code 139 - GXInit defaults 


// Color definitions 


#define GX_DEFAULT_BG {64, 64, 64, 255} 
#define BLACK {0, 0, 0, 0} 
#define WHITE {255 AZ 5 Sp 2Z5oy A255} 
// 
// Render Mode 
7 i 
// (set 'rmode' based upon VIGetTvFormat(); code not shown) 
// 
// Geometry and Vertex 
// 


GXSetTexCoordGen (GX_TEXCOORDO, GX_TG MTX2x4, GX_TG TEX0, GX_IDENTITY 
GXSetTexCoordGen (GX_TEXCOORD1, GX_TG MTX2x4, GX_TG TEX1, GX_IDENTITY 
GXSetTexCoordGen (GX_TEXCOORD2, GX_TG_ MTX2x4, GX TG TEX2, GX IDENTITY 
GXSetTexCoordGen (GX_TEXCOORD3, GX_TG MTX2x4, GX TG TEX3, GX IDENTITY 
GxXSetTexCoordGen (GX_TEXCOORD4, GX_TG MTX2x4, GX TG TEX4, GX IDENTITY 
GXSetTexCoordGen (GX_TEXCOORD5, GX_TG MTX2x4, GX_TG TEX5, GX_IDENTITY 
GXSetTexCoordGen (GX_TEXCOORD6, GX_TG MTX2x4, GX_TG TEX6, GX_IDENTITY 
GXSetTexCoordGen (GX_TEXCOORD7, GX_TG MTX2x4, GX _TG TEX7, GX_IDENTITY 
GxXSetNumTexGens (1); 

GxClearVtxDesc(); 


GXInvalidateVtxCache(); 












































GXSetLineWidth(6, GX_TO_ ZERO); 
GXSetPointSize(6, GX_TO_ ZERO); 
GxXEnableTexOffsets ( GX_TEXCOORDO, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets( GX_TEXCOORD1, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets ( GX_TEXCOORD2, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets ( GX_TEXCOORD3, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets( GX_TEXCOORD4, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets( GX_TEXCOORD5, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets ( GX_TEXCOORD6, GX_DISABLE, GX_DISABLE 
GxEnableTexOffsets ( 





























GX_TEXCOORD7, GX_DISABLE, GX_DISABLE 


// 
// Transformation and Matrix 
// 


// (initialize 'identity_mtx' to identity; code not shown) 


// Note: projection matrix is not initialized! 
GXLoadPosMtxImm(identity mtx, GX_PNMTX0); 
GXLoadNrmMtxImm(identity mtx, GX_PNMTX0); 


( 

GXSetCurrentMtx (GX_PNMTX0) ; 
GXLoadTexMtxImm(identity mtx, GX_IDENTITY, GX_MTX3x4); 
GXLoadTexMtxImm(identity mtx, GX_PTIDENTITY, GX _MTX3x4); 
GXSetViewport (0.0F, // lett 

0.0F, // top 

(float) rmode->fbWidth, // width 

(float) rmode->xfbHeight, // height 

0.0F, // nearz 

1.0F); if fare 
// 
// Clipping and Culling 
// 
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GXSetCoPlanar (GX_DISABLE) ; 

GXSetCullMode (GX_CULL_BACK) ; 

GXSetClipMode (GX_CLIP_ENABLE) ; 

GXSetScissor(0, 0, (u32)rmode->fbWidth, (u32) rmode->efbHeight) ; 
GXSetScissorBoxOffset(0, 0); 


// 
// WGighting - pass vertex color through 
// 
GXSetNumChans(0); // no colors by default 
GxXSetChanCtrl ( 

GX_COLOROAO, 

GX_DISABLE, 

GX_SRC_REG, 

GX_SRC_VTX, 

GX_LIGHT NULL, 

GX_DF_NONE, 

GX_AF_NONE ); 


GXSetChanAmbColor (GX_COLOROA0, BLACK) ; 
GXSetChanMatColor(GX_COLOROAO, WHITE); 


GXSetChanCtrl ( 
GX_COLORIAI1, 
GX_DISABLE, 
GX_SRC_REG, 
GX_SRC_VTX, 
GX_LIGHT_ NULL, 
GX_DF_NONE, 
GX_AF_NONE ); 


GXSetChanAmbColor (GX_COLOR1A1, BLACK) ; 
GXSetChanMatColor (GX_COLOR1A1, WHITE) ; 


// 

// Texture 

// 
GXInvalidateTexAll(); 


// Allocate 8 32k caches for RGBA texture mipmaps. 
// Equal size caches to support 32b RGBA textures. 
// 


// (code not shown) 


// Allocate color index caches in low bank of TMEM. 
// Each cache is 32kB. 
// Even and odd regions should be allocated on different address. 


// (code not shown) 


// Allocate TLUTs, 16 256-entry TLUTs and 4 1K-entry TLUTs. 
// 256-entry TLUTs are 8kB, 1k-entry TLUTs are 32kB. 
// 


// (code not shown) 


// 

// Set texture region and tlut region Callbacks 

// 
GXSetTexRegionCallback(__GXDefaultTexRegionCallback) ; 
GXSetTlutRegionCallback(__GXDefaultTlutRegionCallback) ; 


// 
// Texture Environment 
// 
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GXInit defaults 

















GXSetTevOrder (GX_TEVSTAGEO, GX_TEXCOORDO 
GxXSetTevOrder (GX_TEVSTAGE1, GX_TEXCOORD1 
GXSetTevOrder (GX_TEVSTAGE2, GX_TEXCOORD2 
GXSetTevOrder (GX_TEVSTAGE3, GX_TEXCOORD3 
GxXSetTevOrder (GX_TEVSTAGE4, GX_TEXCOORD4 
GxXSetTevOrder (GX_TEVSTAGE5, GX_TEXCOORD5 
GXSetTevOrder (GX_TEVSTAGE6, GX_TEXCOORD6 
GxXSetTevOrder (GX_TEVSTAGE7, GX_TEXCOORD7 
GXSetTevOrder (GX_TEVSTAGE8, GX_TEXCOORD_ 
GXSetTevOrder (GX_TEVSTAGE9, GX_TEXCOORD_ 
GXSetTevOrder (GX_TEVSTAGE10,GX_TEXCOORD | 
GxXSetTevOrder (GX_TEVSTAGE11,GX_TEXCOORD | 
GXSetTevOrder (GX_TEVSTAGE12,GX_TEXCOORD _ 
GXSetTevOrder (GX_TEVSTAGE13,GX_TEXCOORD | 
GxXSetTevOrder (GX_TEVSTAGE14,GX_TEXCOORD | 
GXSetTevOrder (GX_TEVSTAGE15,GX_TEXCOORD_ 
GXSetNumTevStages (1); 

GXSetTevOp (GX_TEVSTAGEO, GX_REPLACE) ; 








GXSetAlphaCompare (GX_ALWAYS, 0, GX _AOP A 
GXSetZTexture (GX_ZT DISABLE, GX_TF_28, 0 


for (i = 


GXSetTevKColorSel 











ND 
\e 


GX_TEXMAPO, 
GX_TEXMAP1, 
GX_TEXMAP2, 
GX_TEXMAP3, 
GX_TEXMAP4, 
GX_TEXMAP5, 
GX_TEXMAP6, 
GX_TEXMAP7, 











GX_TEXMAP _ 


GX_TEXMAP | 


GX_TEXMAP _ 


GX_TEXMAP_ 
GX_TEXMAP_ 
GX_TEXMAP_ 
GX_TEXMAP_ 





GX_TEXMAP_ 


Gx_co 
GxX_co 
GxX_co 
GX_CcO 
GX_cO 
Gx_cO 
Gx_co 
Gx_co 


UL 


Ul 








222222224 
C 
ie. 








, GX_ALWAYS, 0); 


GX_TEVSTAGEO; i < GX_MAX TEVSTAGE; i+ 


(GXTevStageID) i, 


( 
GXSetTevKAlphaSel ((GXTevStageID) i, 
( 


GXSetTevSwapMode 


} 


(GXTevStageID) i, 


GXSetTevSwapModeTable (GX_TEV_SWAPO, 


GX_CH_RED, GX_CH_GREEN, GX_CH_ BLUE, 





GxXSetTevSwapModeTable (GX_TEV_SWAP1, 
GX_CH_RED, GX_CH_ RED, GX_CH RED, GX_CH_ ALPHA); 
GXSetTevSwapModeTable (GX_TEV_SWAP2, 
GX_CH_GREEN, GX_CH GREEN, GX_CH GREEN, GX_CH ALPHA); 
GXSetTevSwapModeTable (GX_TEV_SWAP3, 
GX_CH_BLUE, GX_CH_ BLUE, 


ti 


GX_TEV_KCSEL 1 4 ); 
GX_TEV_KASEL 1 ); 
GX_TEV_SWAPO, 





LO 
LO 
LO 
LO 
LO 
LO 
LO 





LO 
LO 
LO 
LO 
LO 
LO 
LO 





GX_TEV_SWAPO ); 








// Indirect Textures. 
GX_TEVSTAGEO; i < GX_MAX_ TEVSTAGE; it 
GXSetTevDirect ((GXTevStageID) i); 


for (i = 


} 
GxSetNum 
GXSetInd!l 
GXSetInd!l 
GXSetInd!l 
GXSetInd!l 


// 





// Pixel 


// 
GXSetFog 


ndStages (0); 

[TexCoordScale( GX_INDTEXSTAGEO, 

TexCoordScale( GX_INDTEXSTAGE1, 

[TexCoordScale( GX_INDTEXSTAGE2, 

TexCoordScale( GX_INDTEXSTAGE3, 
Processing 

(GX_FOG NONE, 0.0F, 1.0F, 0.1F, 


GXSetFogRangeAdj( GX_DISABLE, 0, 0 ); 
GxXSetBlendMode (GX_BM_NONE, 


GX_BL_ SRCALPHA, 


GX 





GX 





GX 





GX 





hw 


GX_CH_ BLUE, 
+) { 
ITS 1, GX_ITS 1 
ITS_1, GX_ITS 1 
ITS 1, GX_ITS 1 
ITS_1, GX_ITS_1 
OF, BLACK) ; 


// sre factor 


GX_BL_INVSRCALPHA, // dst factor 


GX_LO CLEAR) ; 


GXSetColorUpdate (GX_ENABLBE) ; 
GxXSetAlphaUpdate (GX_ENABLE) ; 

GXSetZMode (GX_TRUE, GX_LEQUAL, GX_TRUE) ; 
GXSetZCompLoc (GX_TRUE); // before textur 
GXSetDither (GX_ENABLE) ; 
GXSetDstAlpha(GX_DISABLE, 0); 


GXSetPixelFmt (GX_PF RGB8 224, 





GXSetFieldMask ( GX_ENABLE, GX_ENABLE ); 
GXSetFieldMode ( (GXBool) (rmode->field_ rendering), 
((rmode->viHeight == 2*rmode->xfbHeight) ? 


GX_ENABLE : GX_DISABLE) ) 
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GX_CH ALPHA) ; 


GX_CH_ ALPHA) ; 
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// 

// Framebuffer 

// 

GXSetCopyClear (GX DEFAULT BG, GX MAX 224); 





GXSetDispCopySrc(0, 0, rmode->fbWidth, rmode->efbHeight) ; 


GXSetDispCopyDst (rmode->fbWidth, rmode->efbHeight) ; 


GXSetDispCopyYScale((f£32) (rmode->xfbHeight) / (f32) (rmode->efbHeight) ); 
GXSetCopyClamp ( (GXFBClamp) (GX_CLAMP_TOP | GX_CLAMP BOTTOM) ); 
rmode->vfilter) ; 


GXSetCopyFilter(rmode->aa, rmode->sample pattern, 


GXSetDispCopyGamma( GX_GM 1 0 ); 
GXSetDispCopyFrame2Field(GX_COPY_ PROGRESSIVE) ; 
GXClearBoundingBox(); 


// 

// CPU direct EFB access 
// 

GXPokeColorUpdate (GX_TRUE) ; 
GXPokeAlphaUpdate (GX_TRUE) ; 
GXPokeDither (GX FALSE) ; 


GX_TRUE, 


GX LO SET); 





GXPokeBlendMode (GX_BM NONE, GX_BL ZERO, GX_BL_ONE, 
( 


GXPokeAlphaMode (GX_ALWAYS, 0); 
GXPokeAlphaRead (GX_READ FF); 
GXPokeDstAlpha(GX_DISABLE, 0); 
GXPokeZMode (GX_TRUE, GX_ALWAYS, GX_TRUE); 





Lf 

// Performance Counters 

// 

GXSetGPMetric(GX_PERFO NONE, GX_PERF1 NONE) ; 
GXClearGPMetric();// Color definitions 


#define GX_DEFAULT_BG {64, 64, 64, 255} 
#define BLACK {0 0; <0}; 0} 
#define WHITE {255;¢ 255, °255;-2255} 


// 
// Render Mode 
// 


// (set 'rmode' based upon VIGetTvFormat(); code not shown) 


// 

// Geometry and Vertex 

// 

GXSetTexCoordGen (GX_TEXCOORDO, GX_TG MTX2x4, GX_1 
GxXSetTexCoordGen (GX_TEXCOORD1, GX_TG MTX2x4, GX T 
GXSetTexCoordGen (GX_TEXCOORD2, GX_TG_ MTX2x4, GX 7 
GxXSetTexCoordGen (GX_TEXCOORD3, GX_TG MTX2x4, GX_1 
GXSetTexCoordGen (GX_TEXCOORD4, GX_TG MTX2x4, GX_1 
GXSetTexCoordGen (GX_TEXCOORD5, GX_TG MTX2x4, GX_1 
GXSetTexCoordGen (GX_TEXCOORD6, GX_TG MTX2x4, GX_1 























['G 
['G 
['G 


6 
on 
on 
on 





GXSetTexCoordGen (GX_TEXCOORD7, GX_TG MTX2x4, GX_1 
GxXSetNumTexGens (1); 

GXClearVtxDesc(); 

GXInvalidateVtxCache(); 


GXSetLineWidth(6, GX_TO_ ZERO); 

GXSetPointSize(6, GX_TO_ ZERO); 

GxXEnableTexOffsets ( GX_TEXCOORDO, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets( GX_TEXCOORD1, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets ( GX_TEXCOORD2, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets ( GX_TEXCOORD3, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets( GX_TEXCOORD4, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets( GX_TEXCOORD5, GX_DISABLE, GX_DISABLE 
GxXEnableTexOffsets ( 
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TEXO, 
TEX1, 
TEX2, 
[EX3, 
TEX4, 
[EX5, 
TEX6, 





a 


TEXT, 


GX_TEXCOORD6, GX DISABLE, GX_DISABLE 





DE 


NI 


DENT 


DE 


_IDE 
_IDE 
_IDE 


DENT 


_IDE 
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GxXEnableTexOffsets( GX_TEXCOORD7, GX_DISABLE, GX_DISABLE ); 


// 
// Transformation and Matrix 
// 


// (initialize 'identity_mtx' to identity; code not shown) 


// Note: projection matrix is not initialized! 
GXLoadPosMtxImm(identity mtx, GX_PNMTX0); 
GXLoadNrmMtxImm(identity mtx, GX_PNMTX0); 
GXSetCurrentMtx (GX_PNMTX0) ; 
( 
( 


GXLoadTexMtxImm(identity mtx, GX_IDENTITY, GX_MTX3x4); 
GXLoadTexMtxImm identity mtx, GX_PTIDENTITY, GX_MTX3x4) ; 
GXSetViewport (0.0F, // lett 
0.0F, // top 
(float) rmode->fbWidth, // width 
(float) rmode->xfbHeight, // height 
0.0F, // nearz 
1.0F); // farz 
// 
// Clipping and Culling 
// 


GxXSetCoPlanar (GX_DISABLE) ; 

GXSetCul1lMode (GX_CULL_BACK) ; 

GXSetClipMode (GX_CLIP_ENABLE) ; 

GXSetScissor(0, 0, (u32)rmode->fbWidth, (u32) rmode->efbHeight) ; 
GXSetScissorBoxOffset(0, 0); 


// 
// WGighting - pass vertex color through 
// 
GXSetNumChans(0); // no colors by default 
GxXSetChanCtrl ( 

GX_COLOROAO, 

GX_DISABLE, 

GX_SRC_REG, 

GX_SRC_VTX, 

GX_LIGHT NULL, 

GX_DF_NONE, 

GX_AF_NONE ); 


GXSetChanAmbColor (GX _COLOROA0, BLACK) ; 
GXSetChanMatColor (GX _COLOROAO, WHITE); 


GXSetChanCtrl ( 
GX_COLORIAI, 
GX_DISABLE, 
GX_SRC_REG, 
GX_SRC_VTX, 
GX_LIGHT NULL, 
GX_DF_NONE, 
GX_AF_NONE ); 


GXSetChanAmbColor (GX_COLOR1A1, BLACK) ; 
GXSetChanMatColor (GX_COLOR1A1, WHITE) ; 


// 

// Texture 

// 
GXInvalidateTexAll(); 


// Allocate 8 32k caches for RGBA texture mipmaps. 
// Equal size caches to support 32b RGBA textures. 
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// 


// (code not shown) 


// Allocate color index caches in low bank of TMEM. 
// Each cache is 32kB. 
// Even and odd regions should be allocated on different address. 


// 


// (code not shown) 


// Allocate TLUTs, 
// 256-entry TLUTs are 8kB, 


// 


// (code not shown) 


// 


// Set texture region and tlut region Callbacks 


// 


GXSetTexRegionCallback(__GXDefaultTexRegionCallback) ; 
GXSetTlutRegionCallback(__GxXDefaultTlutRegionCallback) ; 


// 


// Texture Environment 


// 

GxSetTevOrd 
GxSetTevOrd 
GxSetTevOrd 
GxXSetTevOrd 
GXSetTevOrd 
GxXSetTevOrd 
GxSetTevOrd 


GxSetTevOrd 
GXSetTevOrd 
GxXSetTevOrd 
GXSetTevOrd 
GxSetTevOrd 
GxSetTevOrd 
GxSetTevOrd 





GxSetTevOrder 


er 
er 
er 
er 
er 
er 
er 
GxXSetTevOrder 
er 
er 
er 
er 
er 
er 
er 


(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 
(GX_TEVS1 





[EXMAPO, GX_CO 
[EXMAP1, GX_CO 
[EXMAP2, GX_CO 
[EXMAP3, GX_CO 
[EXMAP4, GX_CO 
[EXMAP5, GX_CO 
[EXMAP6, GX_CO 








(GX_TEVS1 


GXSetNumTevStages (1); 





GXSetTevOp (GX_TEVSTAGEO, GX_REPLACE) ; 


[TAGEO, GX _TEXCOORDO, GX_1 
TAGE1, GX_TEXCOORD1, GX_1 
[TAGE2, GX _TEXCOORD2, GX_1 
[TAGE3, GX _TEXCOORD3, GX_1 
TAGE4, GX_TEXCOORD4, GX_1 
TAGES, GX_TEXCOORD5, GX_1 
[TAGE6, GX _TEXCOORD6, GX_1 
[TAGE7, GX_TEXCOORD7, GX_1 
[TAGE8, GX _TEXCOORD NULL, 
[TAGE9, GX _TEXCOORD NULL, 
[TAGE10,GX_ TEXCOORD NULL, 
[TAGE11,GX_TEXCOORD_ NULL, 
[TAGE12,GX_ TEXCOORD NULL, 
[TAGE13,GX_TEXCOORD NULL, 
TAGE14,GX_TEXCOORD_ NULL, 
TAGE15,GX_TEXCOORD_ NULL, 























[EXMAP7, GX_CO 


GX_TEXMAP NUL 
GX_TEXMAP_N 


GX_TEXMAP_N 


GX_TEXMAP_N 


GX_TEXMAP_N 


GX_TEXMAP_N 


GX_TEXMAP_N 


GX_TEXMAP_N 


Li 


Li 


Li 











op Ry Gay Re a ae 
E 


GXSetAlphaCompare (GX _ALWAYS, 0, GX_AOP AND, GX_ALWAYS, 0); 
GXSetZTexture (GX_ZT D 





SABLE, GX_TF_Z8, 0); 


for (i = GX_TEVSTAGEO; i < GX_MAX TEVSTAGE; i++) 
GXSetTevKColorSel 
GXSetTevKAlphaSel 
GxXSetTevSwapMode 


} 


{ 


(GXTevStageID) i, GX _TEV_KCSEL 14 ); 


GXSetTevSwapModeTable (GX_TEV_SWAPO, 


GX_CH_RED, GX_CH_GREEN, GX_CH_ BLUE, 





GxXSetTevSwapModeTable (GX_TEV_SWAP1, 
GX_CH_RED, GX_CH_ RED, GX_CH_ RED, GX_CH_ ALPHA); 
GXSetTevSwapModeTable (GX_TEV_SWAP2, 


( 
((GXTevStageID) i, GX_TEV_KASEL 1 ); 
((GXTevStageID) i, GX_TEV_ SWAPO, 


16 256-entry TLUTs and 4 1K-entry TLUTs. 
lk-entry TLUTs are 32kB. 


LO 
LO 
LO 
LO 
LO 
LO 
LO 
LO 
oy 

oy 

oy 


oy 








GX_CH_ GREEN, GX_CH GREEN, GX_CH GREEN, GX_CH ALPHA); 





GXSet TevSwapModeTable (GX_TEV_SWAP3, 


// Indirect Textures. 


for (i = GX_TEVSTAGEO; i < GX MAX TEVSTAGE; i++) 


GX_CH_BLUE, GX_CH_BLUE, 


GXSetTevDirect ((GXTevStageID) i); 


} 


GXSetNumIndStages (0) ; 


GXSetIndTexCoordScale( GX_INDTEXSTAGEO, 
GxXSetIndTexCoordScale( GX_INDTEXSTAGE1, 
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GX_CH_ BLUE, 


GX ITS 1, 


{ 


GX ITS 1 





GX ITS 1, 


GX ITS 1 





)e 
)e 





ROAO) ; 
GX_CO 
GX_CO 
GX_CO 
GX_CcO 
GX_CO 
GX_CO 
GX_CO 
GX_CcO 


LO 
LO 
LO 
LO 
LO 
LO 
LO 





LO 


GX_TEV_ SWAPO ); 


GX_CH_ ALPHA) ; 


GX_CH_ ALPHA) ; 








Co Rg ee Rae ae a 
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GXSetIndTexCoordScale( GX_INDTEXSTAGE2, GX_ITS_1, GX_ITS 1 ); 

GXSetIndTexCoordScale( GX_INDTEXSTAGE3, GX_ITS_1, GX_ITS 1 ); 

// 

// Pixel Processing 

// 

GXSetFog(GX_FOG_ NONE, 0.0F, 1.0F, 0.1F, 1.0F, BLACK); 

GXSetFogRangeAdj ( GX_DISABLE, 0, 0 ); 

GXSetBlendMode (GX_BM_ NONE, 
GX_BL_SRCALPHA, // src factor 
GX_BL_INVSRCALPHA, // dst factor 
GX_LO_CLEAR) ; 

GxXSetColorUpdate (GX_ENABLE) ; 

GxXSetAlphaUpdate (GX_ENABLBE) ; 

GXSetZMode (GX_TRUE, GX_LEQUAL, GX_TRUE) ; 

GXSetZCompLoc (GX_TRUE); // before texture 

GxXSetDither (GX_ENABLE) ; 

GXSetDstAlpha(GX_DISABLE, 0); 

GXSetPixelFmt (GX PF RGB8 224, GX ZC LINEAR); 

GXSetFieldMask( GX_ENABLE, GX_ENABLE ); 

GXSetFieldMode ( (GXBool) (rmode->field_ rendering), 
((rmode->viHeight == 2*rmode->xfbHeight) ? 
GX_ENABLE GX_DISABLE) ); 

// 

// Framebuffer 

// 


GXSetCopyClear (GX DEFAULT BG, 
GXSetDispCopySrc(0, 0, rmode->fbWidth, 
GXSetDispCopyDst (rmode->fbWidth, 





GXSetCopyFilter (rmode->aa, 
GXSetDispCopyGamma( GX_GM 1 0 ); 


GX MAX Z24); 
rmode->efbHeight) ; 
rmode->efbHeight) ; 

GXSetDispCopyYScale( (£32) (rmode->xfbHeight) / 
GXSetCopyClamp ( (GXFBClamp) (GX_CLAMP_TOP 
rmode->sample pattern, 


(£32) (rmode->efbHeight) ); 
| GX_CLAMP_ BOTTOM) ); 
GX_TRUE, rmode->vfilter) ; 


GXSetDispCopyFrame2Field(GX_COPY_PROGRESSIVE) ; 


GXClearBoundingBox(); 
// 
// CPU direct EFB access 


// 

GXPokeColorUpdate (GX_TRUE) ; 
GXPokeAlphaUpdate (GX_TRUE) ; 
GXPokeDither (GX_FALSE) ; 
GXPokeBlendMode (GX_BM NONE, GX _BL_ ZERO, 
GXPokeAlphaMode (GX_ALWAYS, 0); 
GXPokeAlphaRead (GX_READ FF); 
GXPokeDstAlpha(GX_DISABLE, 0); 
GXPokeZMode (GX_TRUE, GX_ALWAYS, 





// 
// 
// 
GXSetGPMetric(GX_PERFO NONE, 
GXClearGPMetric(); 


Performance Counters 
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GX BL ONE, GX_LO SET); 





GX_TRUE) ; 


GX_PERF1_NONE) ; 
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Appendix C. Display list format 


Display list commands are listed in <dolphin/gx/GXCommandList.h>. The graphics processor (GP) 
can interpret the described display list format directly. The bulk of a display list is expected to contain infor- 
mation for describing primitives and their vertices. The display list format contains only limited state com- 
mands. Most state commands, such as the vertex descriptor or vertex attribute format, should be set by 
the appropriate GX API functions prior to calling the display list. We describe the format of certain state- 
setting commands below. 


Display lists must be 32-byte aligned in DRAM and be a multiple of 32 bytes long. Display lists can be pad- 
ded with Gx_ NOP commands to fill out a 32-byte line. 


Display lists are executed by calling the GXCallDisplayList function with a pointer to the display list 
and the number of bytes in the display list. 


Note: There is no “end of display list” token, since you are providing an explicit length in the call. 


C.1 Display list opcodes 


A display list consists of a stream of commands (opcodes) followed by their associated data. There can be 
any number of commands in any sensible sequence within a display list. 


Table 21 - Display list opcodes 
































LO) efexeye(:) ‘ 
Opcode Name Bits[7:0] Next Field Followed By 

GX_DRAW QUADS 10000VatIdx[2:0] | VertexCount[15:0] | Vertex data 
stream 

GX_DRAW_ TRIANGLES 10010Vatldx[2:0] | VertexCount[15:0] | Vertex data 
stream 

GX_DRAW_ TRIANGLE STRIP | 10011Vatldx[2:0] | VertexCount[15:0] | Vertex data 
stream 

GX DRAW TRIANGLE FAN 10100Vatldx[2:0] | VertexCount[15:0] | Vertex data 
stream 

GX_DRAW_LINES 10101Vatldx[2:0] | VertexCount[15:0] | Vertex data 
stream 

GX_DRAW LINE STRIP 10110Vatldx[2:0] | VertexCount[15:0] | Vertex data 
stream 

GX_DRAW_ POINTS 10111Vatldx[2:0] | VertexCount[15:0] | Vertex data 
stream 

GX LOAD BP REG 01100001 (0x61) | Register[31:0] none 

GX_NOP 00000000 None none 




















Note: Vatldx[2:0] is the 3-bit Vertex Attribute Format Table index. This format will be used to interpret the 
vertex data arrays. 


VertexCount[15:0] is a 16-bit count of the number of vertices to follow this command. 
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C.2 Attribute order requirements 


Graphics Library (GX) 


The current Vertex Descriptor (GXSetVtxDesc) is used to indicate the number and type of data or indices. 
Recall that indices may be 8 bits or 16 bits. The current attribute array’s (GkXSetArray) base pointers and 
strides are used for referencing indexed data. These should all be set before executing the display list. 


The table below specifies the required order of attribute values (immediate data or indices) in the display 
list. The order is identical to that specified for immediate mode primitives drawn using GXBegin/GXEnd. 





Table 22 - Vertex index stream order requirements 


Order Attribute 


0 


GX_VA_POSMATIDX 





— 


GX_VA_TEXOMTXIDX 





GX_VA_TEX1MTXIDX 





GX_VA_TEX2MTXIDX 





GX_VA_TEX3MTXIDX 





GX_VA_TEX4MTXIDX 





GX_VA_TEXS5MTXIDX 
GX_VA_TEX6MTXIDX 





oO; oO}; NI] O/}] GO] BR] WY] PD 


= 
oO 


GX_VA_TEX7MTXIDX 
GX_VA_POS 
GX_VA_NRM 





= 
— 


GX_VA_COLORO 





—| — 
WO] NM 


GX_VA_COLOR1 
GX_VA_TEXCOORDO 





—_ 
A 


GX_VA_TEXCOORD1 





= 
Oo | oO 


GX_VA_TEXCOORD2 
GX_VA_TEXCOORD3 





= a 
co] N 


GX_VA_TEXCOORD4 
GX_VA_TEXCOORD5 





= 
o 


GX_VA_TEXCOORD6 





N 
oO 








GX_VA_TEXCOORD7 
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The table below provides an example display list as a list of hexadecimal numbers. This display list could 
be called using "Code 140 - Code necessary to utilize Example Display List" on page 204. 
Note: Since the display list only contains indices to attributes, the formats of the attributes could be 


changed without affecting the display list. However, the attribute format must be described accu- 
rately (i.e., format of the data must match the format described) in order for the display list to be 


drawn correctly. 
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Table 23 - Example_Display_List 


DY =x-Yoq al ey d (ola) Data 





















































GX_DRAW_TRIANGLES, GX_VTXFMTO 0x90 
number of verts = 6 0x0006 
pos_indx (8b) 0x00 
norm_indx (8b) 0x10 
tex_coord_0 (16b) 0x0011 
pos_indx (8b) 0x01 
norm_indx (8b) 0x11 
tex_coord_0 (16b) 0x0012 
pos_indx (8b) 0x02 
norm_indx (8b) 0x12 
tex_coord_0 (16b) 0x0013 
pos_indx (8b) 0x03 
norm_indx (8b) 0x13 
tex_coord_0 (16b) 0x0014 
pos_indx (8b) 0x04 
norm_indx (8b) 0x14 
tex_coord_0 (16b) 0x0015 
pos_indx (8b) 0x05 
norm_indx (8b) 0x15 
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Table 23 - Example_Display_List 

















tex_coord_0 (16b) 0x0016 
no_op 0x00 
no_op 0x00 
no_op 0x00 
no_op 0x00 
no_op, pad to 32B 0x00 














Note: The display list described in "Table 23 - Example Display List" on page 203 assumes the follow- 
ing code sequence to execute: 


Code 140 - Code necessary to utilize Example_Display_List 


GXClearVtxDesc(); 

GXSetVtxDesc( GX_VA_POS, GX_INDEX8 ); 

GXSetVtxDesc( GX_VA_NRM, GX_INDEX8 ); 

GXSetVtxDesc( GX_VA_TEX0, GX_INDEX16 ); 

GXSetVtxAttrFmt ( GX_VTXFMTO, GX VA POS, GX POS XYZ, GX F32, 0 ); 
GXSetVtxAttrFmt ( GX_VTXFMTO, GX VA NRM, GX NRM XYZ, GX S8, 6 ); 
GXSetVtxAttrFmt ( GX _VTXFMT0, GX_VA_TEXO, GX_TEX ST, GX_U16, 5 ); 
GXSetArray( GX_VA_POS, &mypos, sizeof (f32)*3 ); 

GxXSetArray( GX_VA_NRM, &mynrm, sizeof(s8)*3 ); 

GXSetArray( GX_VA_TEX0, &mytex, sizeof(ul6)*2 ); 
GXCallDisplayList( &Example Display List, 32 ); 











C.4 State commands 


Inserting state commands into display lists requires a certain amount of coordination between the state 
that is set by immediate-mode API functions and the state that is set by the display list. You should keep in 
mind that various types of conflicts could arise. 


The first state commands described here are those related to loading texture objects. These “commands” 
are all implemented by setting registers within the GP. Before describing these registers, we explain how 
the GX API loads texture objects. 


Loading a texture object via GXLoadTex0bj performs the following steps: 
1. GX calls the texture region callback in order to obtain a region of TMEM to use with the texture. 


2. GX records the desired texture ID into the texture object, which consists of GP texture registers. 
3. The texture registers are written into the FIFO. 
4 


If this is a color-indexed texture, GX calls the TLUT region callback to obtain the TMEM address for the 
TLUT. This is encoded into a register in the texture object and written into the FIFO. 


5. GX sets a flag to indicate that the texture coordinate scaling registers need to be updated. 
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Calling GXLoadTex0bj PreLoaded is similar, except that step 1 above is omitted. In either case, step 5 is 
necessary because the GP requires that the texture coordinates be scaled according to the texture that 
they will look up. Prior to any geometry being drawn, the update flag is examined and, if set, a function is 
called to update the scaling registers (and the flag is then cleared). The update function behaves as fol- 
lows: 


1. If all texture coordinates are being scaled manually, it simply returns. 


2. Loop over the indirect stages: 
For all non-manual coordinates, it sets the scale register according to the associated map size. 


3. Loop over the normal TEV stages: 
For all non-manual coordinates, it sets the scale register according to the associated map size. 


4. \Itsets the texture coordinate range bias appropriately as each scale register is adjusted. 
Notes: 


¢ Ifa texture coordinate is associated with both an indirect stage and a normal TEV stage, the coor- 
dinate is scaled according to the size of the texture for the normal TEV stage. 


* GXSetTevOrder and GXSetIndTexOrder will also set the update flag. 


Given the preceding information, we should emphasize that putting texture object-loading commands into 
a display list requires that all of the steps be done manually. In particular, you may wish to manage TMEM 
according to your own scheme. You should also consider calling GkSet TexCoordScaleManual1y for all 
of the texture coordinates. 


The subsequent sections detail the display list commands (registers) for loading textures. In order to send 
any of these registers down, you must send down the following sequence for each register: 





0x61 (1 byte) GX _LOAD_BP_REG, the register load com- 
mand. 
XXXX (4 bytes, big-endian) |The actual register as specified below. 


Unmentioned register bits are reserved and should be set to 0. 
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C.4.1 Set_TextureMode0 
Indicates texture lookup and filtering modes. 
Code 141 - Set_TextureMode0 


Graphics Library (GX) 


Content 


8 


9-16 


19-20 


21 


24-31 


linear 

lin_mip_ near 
lin_mip lin 
reserved 

use diagonal LOD 


Oxa0 + GXTexMapID (id >= 4) 


Values 
wrap s 0: clamp 
1: repeat 
2: mirror 
3: reserved 
wrap t same values as wrap_s 
mag_filter O: near 
1: linear 
min_ filter O: near 4: 
1: near mip near 5% 
2: near _mip lin 6: 
3: reserved ae 
diag_lod_enable 0: use edge LOD ie 
lod_bias S2.5 (-4.0:3.99) (2’s complement format) 
max _aniso O: 1 
1: 2 (requires edge LOD) 
2: 4 (requires edge LOD) 
lod_clamp O: off 
(bias clamp) 1: on 
opcode 0x80 + GXTexMapID (id <= 3) 
Set_TextureMode1 


C.4.2 


Indicates min/max LOD info. 
Code 142 - Set_TextureMode1 


Content 


Oxa4 + GXTexMapID (id >= 4) 


C.4.3 


Values 
min lod U4.4 (0:10.0) 
max_lod U4.4 (0:10.0) 
opcode 0x84 + GXTexMapID (id <= 3) 
Set_Texturelmage0 


Indicates texture width, height, and format. 
Code 143 - Set_Texturelmage0 


Content 


24-31 


image_width 
image_height 
image_format 


opcode 


Val 


ues 

(0:1023) value is (real width) - 1 

(0:1023) value is (real height) - 1 
14 4: RGB565 8: C4 12: reserved 
18 5: RGB5A3 9: C8 13: reserved 
IA4 6: RGBA8 103 C14xX2 14: CMP 
TA8 7: reserved 11: reserved 15: reserved 
8 + GXTexMapID (id <= 3) 


Oxa8 + GXTexMapID (id >= 4) 
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C.4.4 Set_Texturelmage1 


Indicates where even LODs are stored (or cached) in TMEM. This data normally comes from a 
GXTexRegion object. 


Code 144 - Set_Texturelmage1 


Bits Content Values 


0-14 tmem_offset address of even LODs in TMEM) >> 5 


( 
15-17 cache width 3: 32KB 
4: 128KB 
5: S5SL2KB 
17-19 cache_height must be equal to cache_width 
20 image_type 0: cached 
1: preloaded 
24-31 opcode Ox8c + GXTexMapID (id <= 3) 


Oxac + GXTexMapID (id >= 4) 


C.4.5 Set_Texturelmage2 


Indicates where odd LODs are stored in TMEM (unused by most planar textures). This data normally 
comes from a GXTexRegion object. 


Code 145 - Set_Texturelmage2 


Bits Content Values 


0-14 tmem_offset address of odd LODs in TMEM) >> 5 


( 
15-17 cache width 32. 32KB 

4: 128KB 

5: S12KB 

0: none (only where odd is not used) 
17-19 cache_height must be equal to cache_width 
24-31 opcode 0x90 + GXTexMapID (id <= 3) 


OxbO + GXTexMapID (id >= 4) 


C.4.6 Set_Texturelmage3 
For cached textures, where the texture is found in main memory. 
Code 146 - Set_Texturelmage3 


Bits Content Values 
0-20 image _base (PHYSICAL address of texture in main memory) >> 5 
24-31 opcode 0x94 + GXTexMapID (id <= 3) 


Oxb4 + GXTexMapID (id >= 4) 
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C.4.7 Set_TextureTLUT 
For color-index textures, where the TLUT is found in the high TMEM bank. 
Code 147 - Set_TextureTLUT 


Bits Content 


0-9 tmem_offset 
10-11 tlut_format 
24-31 opcode 


C.4.8 SU_TSO 


Values 

(offset of TLUT from start of high bank in TMEM) >> 5 
O: IA8 1: RGB565 2: RGB5A3 

0x98 + GXTexMapID (id <= 3) 

Oxb8 + GXTexMapID (id >= 4) 


Indicates the texture coordinate scaling (s component). The point/line offset is normally set by 





GXEnableTexOffsets. 
Code 148 -SU_TSO 


Bits Content 


16 bs 
17 ws 
18 lf 
19 pf 


24-31 opcode 


C.4.9 SU_TS1 


U16 (s scale value - 1) for the specified texcoord 
Enables range bias for s (used with GX_REPEAT only) 
Enables cylindrical texcoord wrapping for s 

Enables texcoord offset for lines using this texcoord 
Enables texcoord offset for points using this texcoord 
0x30 + GXTexCoordID * 2 


Indicates the texture coordinate scaling (t component). 


Code 149 -SU_TS1 


Bits Content 


17 wt 
24-31 opcode 


RVL-06-0037-001-A 
Released: February 27, 2006 


Values 

U16 (t scale value - 1) for the specified texcoord 
Enables range bias for t (used with GX_REPEAT only) 
Enables cylindrical texcoord wrapping for t 

0x31 + GXTexCoordID * 2 
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Appendix D. Nintendo GameCube texture formats 


This appendix describes the Nintendo GameCube texture formats. The first section describes the bit order- 
ing within a texel. The next section describes how texels are arranged in tiles. The texture cache hardware 
fetches texels in tile units. The final section describes how tiles are organized into images. 


D.1 Texel formats 


The texturing hardware supports 14 native texture image formats specified by register 

image format [3:0]. Supported types are Intensity, Intensity Aloha, RGB, RGBA, Color Index, Com- 
pressed, and Z. Supported texel sizes are 4-bit, 8-bit, 16-bit, and 32-bit. The following figures illustrate the 
packing of texel components. 


Figure 74 - Texel formats 




















14 NO : nibble 0 
N1 : nibble 1 
18 
8-bit Z 
1A4 
IA8 
15 Byte 0 8 7 Byte 1 0 
16-bit Z : : Z16:8F | Po E270} |: (Note: use IA8 for 16-bit Z) 
R5G6B5 
15 Byte 0 8 7 Byte 1 0 
— - — — - (RGBS) 
RGB5A3 er 
(RGB4A3) 
RGBA8 (AR) 
(GB) 
24-bit Z (AR) 
(Note: use RGBA8 for 24-bit Z) 
Z[15:8] ft Bf ZTE) Sh (GB) 
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Ccl4 


cls 


cl14 





CMP 


RGBO0[15:0] 





RGB1[15:0] 





T[0][0] 


T[O][1] 


T[O][2] 


T[O][3] 





T[1][0] 


T[1][1] 


T[1[2] 


TL1[3] 





T[2][0] 


T[2][1] 


T{2 [2] 


T[2I[3] 








T[S][0] 





T[S][1] 








TLS [2] 


TLSI[3] 








63 


Byte 0 


56 55 


Byte 1 


Graphics Library (GX) 





48 47 Byte 2 40 39 Byte 3 32 





| Romo) [coo | eomor | Rao | cibor | ero 


31 


Byte 4 


24 23 


Byte 5 


16 15 Byte 6 8 7 Byte 7 0 





The compressed texture format allows for a 1-bit alpha. If a multi-bit alpha is desired, this can be accom- 
plished through the use of multi-texture. 


D.2 Texture tile formats 


Texture images are organized as 32-byte tiles. Each 32-byte tile represents a 2D region of texels. For 4-bit 
texels, each 32-byte tile represents 8x8 texels. For 8-bit texels, each 32-byte tile represents 4x8 texels. For 
16-bit texels, each 32-byte tile represents 4x4 tiles. For 32-bit texels, a pair of 32-byte tiles represents 4x4 
texels. This is illustrated in the following figures. 


Figure 75 - Texture tile formats 











4bit Format : 8x8 texels / cache line 











1F | 20 3F 











ByteO 











Texel 























3F 
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——?> Ss 
{ 0 7 
hf 8bit Format : 4x8 texels / cache line 
0 F |10 1F 
1F ByteO y» Byte31 
—> Ss 








16bit Format : 4x4 texels / cache line 









































8|9/A/B 0 7/8 F 
c|D/E/|F ByteO. ——_______________» Byte31 
—>s —>s 32bit Format : 4x4 texels / 2 cache lines 














0 AR 7| 8 AR F 
ByteOQ§ —____l__tn_y Byte31 























0 GB 7/8 GB F 
Byte32 > Byte63 















































AR GB 





Compressed Format : 8x8 texels / cache line 





0 1 2 3 
ByteO Byte31 




















4x4 texels/block 





D.3 Texture image formats 


In main memory, the starting address of an image is aligned to 32 bytes. The tiles that make up the image 
are stored in row-column order. Both cached and pre-loaded images have the same organization in main 
memory. 
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Each row and column of an image is padded to a 32-byte tile. For 32-bit texels, each row is actually pad- 
ded to a pair of 32-byte tiles. Each image is stored contiguously in main memory. For a mipmap pyramid, 
the levels are stored contiguously. The ordering of the images is from finest to coarsest. Each level of the 
mipmap pyramid is aligned to a 32-byte tile. This is illustrated in the following figures. 


Figure 76 - Texture image formats 











C|}D/}E/| F | 10) 11 





























32B tile 


In main memory, images are alligned to 32B. 

32B tiles are stored in row-column order. 

Each row and column is padded to 32B. 

For 32-bit format, each row is padded to a tile pair. 





Mipmap images are stored contiguously. 
Each level is alligned to 32B. 











32B Alligned 
Level 0 
32B Alligned —> 
Level 1 
32B Alligned 
32B Alligned peel? 
mee Level 3 
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Appendix E. Memory issues 


The Graphics Processor has several data memory requirements, including alignment requirements for the 
following types of data: 


* Texture and TLUT images. 

* Display lists. 

¢« Graphics FIFO. 

¢ External frame buffer (XFB). 


E.1 Rules of alignment 


These data objects must be aligned because the GP is very fast; data from the main memory is transferred 
in 32-byte chunks. Data alignment allows for simple and fast hardware. 


On other data objects, such as vertex, matrix and light arrays, additional hardware support eliminates the 
need for coarse alignment (these are 4-byte aligned). There are a large number of these data objects, and 
the memory consumption of each object is potentially low, so relaxing alignment restrictions helps to con- 
serve memory. 


The following table outlines the alignment rules for these objects. 


Table 24 - Memory alignment rules 


Data Object Alignment Rule Function 
Texture Map « Base address = 32B * GXInitTexObj (obj, image_ptr, ..) 
¢ Width and height * GXInitTexObjCI (obj, image _ptr, ...) 
rounded up to tile &. edoporetideet, 





boundary. Each tile is 
32B. 


« Base address of each 


map in mipmap is 
aligned to 32B 








Texture Lookup * Base address = 32B * GXInitTlutObj( , lut, ) 
Table (TLUT) lens 328 
Display List « Base address = 32B * GXBeginDisplayList (list, size) 
¢ Length = 32B * GXCallDisplayList (list, nbytes) 
Graphics FIFO ¢ Base address = 32B * GXWriteFifoBase (base, size) 
¢ Length = 32B * GXWriteFifoLimits (hi_water mark, 


lo_ water mark) 


0 I@ANiecSMLtORcics (ieSacl joie, 
write ptr) 


° GXReadFifoStatus(,,, £ifo_cnt) 


External Frame « Base address = 32B * GXCopyDisp (dest, ....) 


Butler. (XEB) * Width = 32B * vViSetNextBuffer (framebuffer, ..) 


« Length = 32B 
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E.2 Alignment assistance functions 


The Nintendo GameCube libraries contain functions that assist with alignment issues. Several examples 
appear in the table below. 


Table 25 - Alignment assistance functions 


Function Purpose 




















ATTRIBUTE ALIGN (32) CodeWarrior compiler static variable alignment directive. 

OSRoundUp32B (x) Macro to round up a pointer to 32B. 

OSRoundDown32B (x) Macro to round down a pointer to 32B. 

OSAlloc (size) Memory heap allocation function. Always returns 32B base 
address and length of memory buffer. 

GXGetTexBufferSz(x, y, Function to compute correct amount of memory required to store 

texel type, mipmap) a texture, based on the texture width, height, and texel type, and 
whether or not this texture is mipmapped. 

VIAlignFramebufferWidth Function to compute correct amount of memory required for the 

(width) frame buffer, based on the pixel width of the frame buffer. 














E.3. Data coherency 


Nintendo GameCube has multiple processors and hardware blocks that can update main memory. In addi- 
tion, the CPU and GP contain various data caches. Since the hardware does not maintain coherency of the 
data in main memory and various associated caches, there are three potential sources of coherency prob- 
lems: 


« When the CPU modifies or generates data destined for the GP. 
« When the CPU writes data through its write-gather buffer to cached memory. 
« When loading new data destined for the GP from the Nintendo GameCube Disc into main memory. 


Coherency problems may occur if the main memory used to store the data in these two latter cases were 
used for other graphics data. 
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Figure 77 - Data coherency 
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The arrows in the preceding figure represent the following typical operations: 


1. 


Oo ON on F w 


10. 
11. 


Loading texture images from the Nintendo GameCube Disc to main memory for a new game sector or 
level. 


Loading geometry vertex display list from the Nintendo GameCube Disc to main memory for a new 
game sector or level. 


Dynamic rendering of texture maps by the CPU. 

Dynamic generation or modification of vertices by the CPU. 
CPU animating lights and matrices. 

CPU generating display lists. 

CPU generating the graphics command stream. 

GP reading graphics command stream. 

GP reading display lists. 

GP accessing vertices for rendering. 


GP accessing textures for rendering. 


In addition, other combinations are possible, such as loading display lists from the Nintendo GameCube 
Disc, or writing command streams or display lists through the CPU cache. 
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E.3.1 About the DVD library loading graphics data 


When the DVD library loads data, the DVD API automatically invalidates the loaded main memory portion 
that resides in the CPU data cache. This feature provides a safe method for programmers to modify the 
Nintendo GameCube Disc loaded data without worrying about CPU data cache coherency. This DVD API 
feature activates by default; it can be deactivated by the programmer. 


The graphical data loaded by the DVD library may contain textures and vertices that have been already 
formatted for the GP to render. Therefore, invalidation of the vertex cache and texture cache regions may 
be necessary. 


Code 150 - DVDSetAutolnvalidation 


BOOL DVDSetAutoInvalidation(BOOL autoInval) ; 
void GXInvalidateVtxCache( void ); 

void GXInvalidateTexRegion(GXTexRegion *region); 
void GXInvalidateTexAll( void ); 


E.3.2 CPU generating or modifying graphics data 


The CPU has two means of writing to main memory: the write-gather buffer and the CPU cache hierarchy. 
The write-gather buffer is normally used to “blast” graphics commands into memory without affecting the 
cache. As a result, information sent through the write-gather buffer is not cache coherent. Care must be 
taken when using the write-gather buffer to avoid writing to areas of memory that may be found in the CPU 
cache. The cache flushing instructions shown below may be used to force data areas out of the CPU 
cache. 


If the CPU generates or modifies graphics data through its cache, the following memory types may end up 
containing stale data: 


* Main memory. 
¢ GP vertex cache and texture cache regions. 


To send the correct data to the GP, we need to flush the CPU data cache as well as invalidate the GP ver- 
tex or texture cache. The CPU typically animates data one frame ahead of the GP, so efficient techniques 
to maintain data coherency include: 


¢ Grouping all the CPU-modified graphics data in main memory sequentially, so that the block data 
cache flush is efficient. 


« — Invalidating the vertex cache, as well as the entire texture cache, at the beginning of each graphics 
frame. 


Code 151 - Commands to flush the CPU data cache 


void DCFlushRange(void* startAddr, u32 nBytes); // write out & invalidate 
void DCStoreRange(void* startAddr, u32 nBytes); // write out only 
void DCInvalidateRange(void* startAddr, u32 nBytes); // invalidate only 


E.3.2.1 Immediate mode 


If you use GX immediate mode APIs to update matrix or light data, you don’t need to worry about coher- 
ency issues. These APIs copy the arguments into the graphics FIFO, and include matrix and lighting func- 
tions with the form GXLoad* Imm. 
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E.3.2.2 Direct data 


If some vertex attributes in the vertex array descriptor are of the type GX_ DIRECT, this data is copied 
directly into the graphics FIFO, so users must be aware of coherency considerations. 





E.3.2.3 Indexed data 


If you use the GX indexed mode APIs to update matrix or light data, the software must flush the data cache 
in order to move the correct data into main memory. These APIs include matrix and lighting functions with 
the form GXLoad* Indx. 


Furthermore, the hardware implements indexed matrices and lights by passing this data through the vertex 
cache; therefore, you must invalidate the vertex cache also. (The only reason for this is to simplify the 
hardware design.) 


E.3.2.4 CPU scratchpad 


If the CPU L1 data cache is partitioned in scratchpad mode, you will need to DMA the modified data to 
main memory instead of flushing it from the normal data cache. 
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